How to untag or process XML data type column along with other regular columns

Five Stars

How to untag or process XML data type column along with other regular columns

Hi,

 

I have  an unusual situation and need your help. I am new to talend and making my way to be efficient. I will explain my question with an example:

1. I am extracting data from a DB2 table and this table has 4 columns:

       Column 1           Integer

       Column 2           Integer

       Column 3           XML_DATA

       Column 4           Date

2. Finally, loading all the data into Hive table on the HDFS. I am not sure of Hive table structure.

Questions:

I do not know the list of XML tags in Column 3. Is it required to know the list of tags? How do i find it?

How do i untag the Column 3? How can i achieve the following file format:

              column1;column2;untag column3;column4

can "untag column 3" split into multiple fields in the output file?

 

I understand lots of questions. Hopefully, someone can help me. Thank you and appreciate all your help.

 

 


Accepted Solutions
Twelve Stars

Re: How to untag or process XML data type column along with other regular columns

In Case when You want extract XML tags to columns, You have 2 main choices:

 

  1. (my preferred when need extract from databases XML to csv ) - use database XML functions
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0606nicola/
  2. You can use Talend components for extract data from XML tExtractXMLFields
    in both case - You need to know tags


  3. You can also check XML parsing on Hive side 
    example - https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
     https://resources.zaloni.com/blog/xml-processing
    it also require knowing of tags, but at different step

 

-----------

All Replies
Twelve Stars

Re: How to untag or process XML data type column along with other regular columns

In Case when You want extract XML tags to columns, You have 2 main choices:

 

  1. (my preferred when need extract from databases XML to csv ) - use database XML functions
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0606nicola/
  2. You can use Talend components for extract data from XML tExtractXMLFields
    in both case - You need to know tags


  3. You can also check XML parsing on Hive side 
    example - https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
     https://resources.zaloni.com/blog/xml-processing
    it also require knowing of tags, but at different step

 

-----------