One Star

parse xml elements/attribute from xml file..

Hi,
I am new to Talend Big Data and am working on Hadoop using Talend Big data.
I have stored xml file into HDFS using tHdfsPut component successfully and read whole xml file using tPigLoad component and it works fine.
But my requirement is to parse some xml element and attribute data from that xml which is in hdfs using tpig component.
I have used tPigLoad ,tPigCode and tPigStoreResult component. I think my tPigCode query is not correct.
so anyone can tell me what is the query for tPigCode. I have wasted so many days but no output..
please help me out from this... how to extract data from xml and write it into some file...
please reply soon...
Could you please help me out from this task?
Xml Format is:--
<?xml version="1.0" encoding="UTF-8"?>
<CustomerDetails>
<CustomerName name="shikha" id="12"/>
</CustomerDetails>
I have created two columns "name" and "id" in schema.
tPigQuery is:-
tPigCode_1_RESULT = foreach tPigLoad_1_RESULT generate $0 as name, $1 as id;

Please tell me each component basic settings and also TPigCode query.
Thanks
Shikha Tyagi
1 REPLY
Employee

Re: parse xml elements/attribute from xml file..

Hi,
Your PigQuery is good since you are using the good relation names (tPigCode_1_RESULT and tPigLoad_1_RESULT).
But It's not the good way of parsing the XML. You would need to use the XMLLoader after registering it into your PIG script.
Unfortunately, it's not yet possible in Talend. A festure request already have been opened in our JIRA for that.
Regards,