Five Stars

Select a specifc XML node (from a repeating node) with TXMLMap

I'm currently building a DataWarehouse and I have to fetch all sales data from POS into DB. I am able to extract and load the data to MSSQL DB, but for one of the xml node I actually would like to get the second node instead. The XML schema with the desire output as below:

 

XML schema.PNG

 

The job is very simple like below.

Job.PNG

 

If you notice here the <CloseTime> having 2 child nodes one for previous day business day and EOD, another for the current day bizday and eod.

I need to fetch the second node to combine with the sales header information. 

 

XMLData.PNG


Currently only the first node is fetch and successfully combine with the sales header information, but it should be the second node to be combine with the sales header. How do I achieve this in TXMLMap? Or do I need to use another component just to do that?

 

This is my first time using Talend so if you could recommend a solution with steps it would be best!

 

Thanks.

  • Data Integration
18 REPLIES
Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

You will need to use a different component for this. When reading from XML I rarely use a tXMLMap since it is very restrictive. The best component (in my opinion) for working with reading XML is the tExtractXMLField component, used with XPath queries. XPath can be a pain to learn, but you can test things out online very easily (http://www.xpathtester.com/xpath).

 

For your problem here you will be wanting to return the nth instance of a loop. This problem I believe is answered here (I haven't tested it but it looks right)....

https://stackoverflow.com/questions/4007413/xpath-query-to-get-nth-instance-of-an-element 

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

Hi, is that not possible using tXMLMap?

I need to extract the XML and split into different table in database eventually.

 

Job.PNG

 

Basically the tFileInputXML will take every nodes from the root and tXMLMap will loop accordingly.

 

With tExtractXMLField I'm not sure if it can do the similar to break the XML content into seperate tables based on the node.

Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

You could try moving your "loop" in your tXMLMap to the CloseType's Day element....but I suspect you have set this to Header for a reason, and you cannot have 2 of these.

 

The tExtractXMLField component gives you so much more control. You can do what you want to achieve using that with a tMap or a tXMLMap component

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

Yes, I need to loop the Header because it has many receipt node and one header in each receipt node. I need to retrieve every header nodes and store in a table in DB. Then the next thing will be line node - there can be multiple line node in each receipt node, so I need to loop through line nodes to retrieve and store in another table in DB. Sames goes to Payment node. So basically it looks like this:

 

<DailySales>

    <Receipt>

        <Header>...</Header>

        <Line>...</Line>

        <Line>...</Line>

        <Line>...</Line>

        <Payment>...</Payment>

    </Receipt>

</DailySales>

 

I'm able to do it using tExtractXMLField, but when I loop on header so I supposed this to retrieve for each and every header node. So for line and payment nodes do I drop another tExtractXMLField? Currently it look like this:

 

 Job2.PNG

 

I'm not able to output form the tFileInputXML_1 using Main again, the option left is Row -> Reject/Iterate. My objective is to extract from each xml files and split it into 3 DB tables.

 

Please advise, thanks.

Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

Get the data as granular as you need it using the tExtractXMLField component and then connect to a tMap component. Use the tMap component to split your data into 3 streams

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

If I loop at the <Line> level, means there will be many duplicating header and payment information. How do I put this into tmap to split it accordingly and without duplication of header and payment information?

Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

With the tExtractXMLField component you can extract a sub-document. Set the column type to be Document and tick the "Get Nodes" tick box. You can pass a separate document for each loop instance to the next component.

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

I've tried that, so for the first tExtractXMLField, I used <header> as the loop, then in the column I past in <Line> as document.

Strangely the output is only giving me the very first <line> in first <receipt> with many repetition (since I'm looping header).

 

I also tried using <receipt> (one level up from header/line/payment) as the xpath loop but the result is same.

Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

Ok I tried again and this time is success, but may I know what should be the flow?

 

I thought about tFileList -> tFileInputXML -> tExtractXMLField (header only) -> tExtractXMLField (line only).

 

Where do I put the tmap here? how do I explicitly fetch header to SalesHeader table in DB and fetch line to SalesLine table in DB?

I thought I could do tFileInputXML and having multiple output to 3 tExtractXMLField (header,line,payment), but seems like only one row-> main is allowed.

 

Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

It's difficult to say for sure because I don't have your data (and this takes a little bit of trial and error in a lot of cases), but I would suggest putting your tMap after the tExtractXMLField (line only).

 

Another thing that might solve a few issues for you. Lets say the your first tExtractXMLField component output the following columns....

 

id (int)
name (String)

age (int)

header (Document)

 

...and you want to process the header document in the next tExtractXMLField component BUT want to output the id, name and age. You can simply add the id, name and age columns to your next tExtractXMLField component and leave them blank where you would normally configure the XPath. This will act as a pass through. So your next column config in the second tExtractXMLField component might look like below....

id (int)
name (String)

age (int)

header_data_1 (String)

header_data_2 (String)

header_data_3 (String)

header_data_4 (String)

 

This is quite an abstracted example, but have a play and see what you get.

 

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap


rhall_2_0 wrote:

It's difficult to say for sure because I don't have your data (and this takes a little bit of trial and error in a lot of cases), but I would suggest putting your tMap after the tExtractXMLField (line only).

 

Another thing that might solve a few issues for you. Lets say the your first tExtractXMLField component output the following columns....

 

id (int)
name (String)

age (int)

header (Document)

 

...and you want to process the header document in the next tExtractXMLField component BUT want to output the id, name and age. You can simply add the id, name and age columns to your next tExtractXMLField component and leave them blank where you would normally configure the XPath. This will act as a pass through. So your next column config in the second tExtractXMLField component might look like below....

id (int)
name (String)

age (int)

header_data_1 (String)

header_data_2 (String)

header_data_3 (String)

header_data_4 (String)

 

This is quite an abstracted example, but have a play and see what you get.

 


Oh man, the highlighted part is the life saver! I tried to figure it out and to no avail and your clue just come in time!

 

 

Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

I thought that might be a problem. It all seems to logical until you try to figure out how to pass extracted data through a tExtractXMLField component for the first time :-)

Talend is a really powerful tool, but there is a lot that is not made immediately obvious

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

Hmm I think this is not what I want to achieve still..

 

Because the tmap after tExtractXMLField (line only) will not get the fields for header. Means I have to get back to one tExtractXMLField with the loop on <line>. So for e.g. my xml ideally have 400 <header> and 1000 <line> (it's a 1-m relationship), but since I loop on <line> I will get 1000 <header> with 1000 <line>.

 

If I put this in tMap, will it be able to separate into 400 sales header and 1000 sales line into DB? If so how do I configure tMap to do so?

 

Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

I will need to see a complete XML structure. You mention "Line" but I am not sure how that relates to "Header". You say that when you get the Line records you get the same amount of Header records. This makes sense. You are flattening the data into rows. This will mean that the following structure....

<xml>
<data1>d1</data1>
<data2>d2</data2>
<data3>d3</data3>
 <loops>
<loop>
<ldata1>ld1a</ldata1>
<ldata2>ld2a</ldata2>
</loop>
<loop>
<ldata1>ld1b</ldata1>
<ldata2>ld2b</ldata2>
</loop>
<loop>
<ldata1>ld1c</ldata1>
<ldata2>ld2c</ldata2>
</loop>
</loops>
</xml>

....will look like below when it is flattened....

d1,d2,d3,ld1a,ld2a
d1,d2,d3,ld1b,ld2b
d1,d2,d3,ld1c,ld2c

The first 3 columns are multiplied by the number of loops. This actually makes perfect sense. If you only want to see the data once, then you need to output to two tables (in this case).

You can use logic to do that in your case. You can either use an existing piece of data as a "key" to see when each loop has finished, or you can add one. Now if I were to split this data to two outputs (which would make sense....two tables), I would put these columns into a tMap and send the first 3 to one output and the last 2 to the other output. I would then use a tAggregateRow to group my data (reduce duplicates to 1 row) using the "key" I mentioned earlier.

 

It takes a bit of thinking about, but it is quite straight forward once you've logically solved it. You have to consider XML as a mini relational database. Think of it like that (considering possible primary keys and foreign keys) and you will crack this.

 

 

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

Thanks for that! Will definitely try on that.
One side question. I'm actually building a data warehouse based on the sales data from POS, what do you think should I store the data in the flattened data (denormalised) or should I normalise it? These data is actually store for future reference in case I need to build different data mart to answer different questions.

After the first load from XML to DB, I'm going to create different table for different summarise data such as sales by item by store per day, sales by transaction count by item by store per day etc.

I understand that with denomalize data it is better in read performance, which I supposed Datewarehouse is more heavily on reading for analytics purpose. What is your saying on this? (This is my first time building data warehouse so need some advise)
Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

You shouldn't really ask multiple questions in one question because useful answers to the community can get lost. However, this isn't really specific to Talend. The question of normalized vs denormalized data is a lot more complicated than what I'd feel comfortable answering in a few lines. It also requires a lot more information. You have a grasp of the basics (from your description), what I suggest you do is learn more about your requirements (both functional and non-functional) and decide based on that. There is little point normalizing data that will ultimately be queried in the same way it was entered....unless of course space is an issue. It's always a weigh up between space and time

Rilhia Solutions
Five Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

I'm able to achieve what I wanted, but sometimes the CloseTime <Day> element only appear once (when the store is newly opened and no previous <Day> element). Currently I am using TMap and the expression is /CloseTime/Day[2]/bizDate, but if the <Day> node appear once then this will return null. How do I specify if it is null then take Day[1]/bizDate instead? 

Eleven Stars

Re: Select a specifc XML node (from a repeating node) with TXMLMap

Return both values and then decide (maybe in a tMap) based on whether the 2nd date returned is null or not.

Rilhia Solutions