Five Stars

How to untag or process XML data type column along with other regular columns

Hi,

 

I have  an unusual situation and need your help. I am new to talend and making my way to be efficient. I will explain my question with an example:

1. I am extracting data from a DB2 table and this table has 4 columns:

       Column 1           Integer

       Column 2           Integer

       Column 3           XML_DATA

       Column 4           Date

2. Finally, loading all the data into Hive table on the HDFS. I am not sure of Hive table structure.

Questions:

I do not know the list of XML tags in Column 3. Is it required to know the list of tags? How do i find it?

How do i untag the Column 3? How can i achieve the following file format:

              column1;column2;untag column3;column4

can "untag column 3" split into multiple fields in the output file?

 

I understand lots of questions. Hopefully, someone can help me. Thank you and appreciate all your help.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ten Stars

Re: How to untag or process XML data type column along with other regular columns

In Case when You want extract XML tags to columns, You have 2 main choices:

 

  1. (my preferred when need extract from databases XML to csv ) - use database XML functions
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0606nicola/
  2. You can use Talend components for extract data from XML tExtractXMLFields
    in both case - You need to know tags


  3. You can also check XML parsing on Hive side 
    example - https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
     https://resources.zaloni.com/blog/xml-processing
    it also require knowing of tags, but at different step

 

-----------
1 REPLY
Ten Stars

Re: How to untag or process XML data type column along with other regular columns

In Case when You want extract XML tags to columns, You have 2 main choices:

 

  1. (my preferred when need extract from databases XML to csv ) - use database XML functions
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0606nicola/
  2. You can use Talend components for extract data from XML tExtractXMLFields
    in both case - You need to know tags


  3. You can also check XML parsing on Hive side 
    example - https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
     https://resources.zaloni.com/blog/xml-processing
    it also require knowing of tags, but at different step

 

-----------