How to untag or process XML data type column along with other regular columns

Five Stars

How to untag or process XML data type column along with other regular columns

Hi,

 

I have  an unusual situation and need your help. I am new to talend and making my way to be efficient. I will explain my question with an example:

1. I am extracting data from a DB2 table and this table has 4 columns:

       Column 1           Integer

       Column 2           Integer

       Column 3           XML_DATA

       Column 4           Date

2. Finally, loading all the data into Hive table on the HDFS. I am not sure of Hive table structure.

Questions:

I do not know the list of XML tags in Column 3. Is it required to know the list of tags? How do i find it?

How do i untag the Column 3? How can i achieve the following file format:

              column1;column2;untag column3;column4

can "untag column 3" split into multiple fields in the output file?

 

I understand lots of questions. Hopefully, someone can help me. Thank you and appreciate all your help.

 

 


Accepted Solutions
Forteen Stars

Re: How to untag or process XML data type column along with other regular columns

In Case when You want extract XML tags to columns, You have 2 main choices:

 

  1. (my preferred when need extract from databases XML to csv ) - use database XML functions
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0606nicola/
  2. You can use Talend components for extract data from XML tExtractXMLFields
    in both case - You need to know tags


  3. You can also check XML parsing on Hive side 
    example - https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
     https://resources.zaloni.com/blog/xml-processing
    it also require knowing of tags, but at different step

 

-----------

All Replies
Forteen Stars

Re: How to untag or process XML data type column along with other regular columns

In Case when You want extract XML tags to columns, You have 2 main choices:

 

  1. (my preferred when need extract from databases XML to csv ) - use database XML functions
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/
    https://www.ibm.com/developerworks/data/library/techarticle/dm-0606nicola/
  2. You can use Talend components for extract data from XML tExtractXMLFields
    in both case - You need to know tags


  3. You can also check XML parsing on Hive side 
    example - https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html
     https://resources.zaloni.com/blog/xml-processing
    it also require knowing of tags, but at different step

 

-----------

15TH OCTOBER, COUNTY HALL, LONDON

Join us at the Community Lounge.

Register Now

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog