One Star

Reading complex xml in Hadoop HDFS

Hi,
Has any read/parsed deeply nested xml that is stored in Hortonworks HDFS file system using Talend. Our requirement is that we have deeply nested raw xml file already landed in HDFS by a different process. We need to read this with Talend and further process it. The volume of this file is very high. There are 2000+ 50 MB files.
I heard in 5.4.1 we could generate native map reduce(not pig and hive code) inside Hadoop. Please share us your experiences if someone has worked on this type of problem.
Thanks
Subra
6 REPLIES
Moderator

Re: Reading complex xml in Hadoop HDFS

Hi,
From your description, you can use tHDFSConnection-->onsubjobOk-->tHDFSList-->tHDFSget to get your files into local machine disk from HDFS then make further process.
For your high volume files, tHDFSList can retrieve a list of files or folders based on a filemask pattern and iterates on each unity.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

Re: Reading complex xml in Hadoop HDFS

Hi,
The requirement is that to parse the file in hadoop itself using talend MR capabilities without pulling the file in to local.We use 5.3.1 and sooner will move to 5.4.1 version.Please let us know if there is any such feature that we can use.
Thanks,
Swami.
Moderator

Re: Reading complex xml in Hadoop HDFS

Hi,
So far, talend open studio for big data cannot achieve your goal.
Talend Enterprise Subscription Version can meet your needs. Feel free contact us.
Best regards
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

Re: Reading complex xml in Hadoop HDFS

Hi,
We are trying to achieve this with Talend Enterprise version for Big Data.We are trying to parse a complex XML with multiple tags nested.Please do let us know on how this can be done with TIS 5.3.1 Map Reduce as it is one of our requirements and if Talend can help us in leveraging Hadoop MR to parse the XML this will make our job simpler.
Thanks,
Swami.
Moderator

Re: Reading complex xml in Hadoop HDFS

Hi,
With Enterprise Subscription Version Product, please open a jira issue of DI project on Talend Bug Tracker for Map Reduce XML components, our developer will custom one for you.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Reading complex xml in Hadoop HDFS

We are using talend enterprise Big Data 5.5 version and are facing similar issue. We cannot write a MapReduce job in Talend BD to parse the XML data in HDFS Smiley Sad. Is there a way to do this using custom code?