Four Stars

Question on aggregating combinations of XML fields

 

In this example, the input has multiple cheeses, some with multiple characteristics and colors. Is it possible to create one row of CSV output for each characteristic and color combination containing both of their values along with the containing Cheese element's name value? An example demonstrating the goal is shown below.

 

<Cheeses>
<Cheese>
<Name>Swiss</Name>
<Characteristics>
<Characteristic>holes</Characteristic>
</Characteristics>
<Colors>
<Color>white</Color>
</Colors>
</Cheese>
<Cheese>
<Name>Muenster</Name>
<Characteristics>
<Characteristic>rind</Characteristic>
</Characteristics>
<Colors>
<Color>orange</Color>
<Color>yellow</Color>
</Colors>
</Cheese>
<Cheese>
<Name>Blue</Name>
<Characteristics>
<Characteristic>mold</Characteristic>
<Characteristic>pungent</Characteristic>
</Characteristics>
<Colors>
<Color>yellow</Color>
<Color>blue</Color>
</Colors>
</Cheese>
</Cheeses>

 

I'm trying to make a CSV file from that input that looks like this.

 

Name,Characteristic,Color
Swiss,holes,white
Muenster,rind,yellow
Muenster,rind,orance
Blue,mold,yellow
Blue,mold,blue
Blue,pungent,yellow
Blue,pungent,blue

 

When I set the Cheese element as a loop element in a tXMLMap, setting the Characteristic and Color elements as loop elements drops the looping from the Cheese elements. I've also tried doing this using an outer join between a tFileInputXML loop on the Characteristic element and a tFileInputXML loop on the Color node. However, the value that I'd use as a key would be the Name element's value and I'd have to refer to it with the path ../../Name. This path did not show up in the map editor, even it was already in the repository metadata or I added it to the xml tree manually. Am I missing something or is there another way to accomplish this?

Tags (2)
2 REPLIES
Twelve Stars

Re: Question on aggregating combinations of XML fields

Take a look at the tExtractXMLField component. You can see how to use it here (https://help.talend.com/reader/QgrwjIQJDI2TJ1pa2caRQA/IQUk6wt8M75q8vUo~UPbiQ). You essentially have 3 loops, with 2 nested inside the other. The first tExtractXMLField will be used to extract the cheeses. When you extract the records, you will need to extract the 2 nested loops as Documents with the "Get Nodes" box ticked. This will extract a row with your "Name" value and two Documents ("Characteristics" and "Colors"). Pass this data to a tMap and split it to a tExtractXMLField to extract the "Characteristics" and one to extract the "Colors". You can pass through the "Name" field in both of these tExtractXMLField components. This will result in two datasets; a "Characteristics" and "COlors" dataset. Load these to tHash components. Now all you need to do is join the two datasets in another subjob using the "Name" field as the key. From here it should be pretty straight forward.

Rilhia Solutions
Four Stars

Re: Question on aggregating combinations of XML fields

Thanks a lot. When I get around to it, I'll see if that answer works though it may take some time because I'm new to Talend and I'm not familiar with hashes and subjobs.