Transforming txt file with header and footer into 1 xml file

Transforming txt file with header and footer into 1 xml file

I have a text file with a header on the first line and a footer at the last line as follows:

I wanna transform this into an xml file containing one header element, as many body elements and a footer element. Example:
<?xml version="1.0" encoding="ISO-8859-15"?>
<Header TRX_TYPE="PRI" STORE_ID="21861"/>
<Price CD_RONA="0951074" CD_UPP="70001" EFFECTIVE_DATE="20030818" RETAIL_UOM_FR="CH" CONVRESION_FACTOR="03125" PRIX_DETAIL="075" Column7="92" RETAIL_UOM_EN="EA" Column9="" Column10="" Column11="" Column12="" Column13=""/>
<Price CD_RONA="0951074" CD_UPP="70001" EFFECTIVE_DATE="20030818" RETAIL_UOM_FR="CN" CONVRESION_FACTOR="100000" PRIX_DETAIL="2400" Column7="92" RETAIL_UOM_EN="CN" Column9="" Column10="" Column11="" Column12="" Column13=""/>
<Price CD_RONA="0951076" CD_UPP="773110394412" EFFECTIVE_DATE="20030818" RETAIL_UOM_FR="CH" CONVRESION_FACTOR="03125" PRIX_DETAIL="085" Column7="92" RETAIL_UOM_EN="EA" Column9="" Column10="" Column11="" Column12="" Column13=""/>
<Footer LINE_COUNT="5" STORE_ID="21861"/>

I was able to use the t_Map component to define three distinct routes as shown in the image attached with the name 't_Map_multiOutput.jpg' and outputting three different xml files, one for header, one for body and one for footer. I have also tried to use the append feature of the t_AdvancedOutputXML component. But since all the mapping is happening almost simultaneously I get an error when the file is being appended as the stream is still open when another route is trying to open stream again.
The header mapping image attached shows the mappings of the Header section in t_Map as en example (file: header_mapping.jpg)
So I bypassed the problem by creating a sequential flow of mapping each section and appending to the same xml. I've done this through three subjobs and made sure to append to the xml file once the previous subjob is done. Review the 'sequential_mapping.jpg' image to see what I mean. Ignore the top input and the bottom output as they're only ftp GET and PUT of the txt file received and xml file generated.
Each t_Map job of each component deals with a specific section of the file as you can see, one for header, one for body and one for footer. So I separated the three outputs of my original t_Map into a single output in each t_Map of my new subjobs.
The output works great and I get everything as expected. But I was wondering if there's a better way in doing this simple task in Talend without dealing with sequential mappings.
Can't I ask for example the t_advanceOutputXml component to wait for the stream to be closed prior to executing it's current output task? This would simplify greatly my transformation as it would look like my original job (first attached image, 't_Map_multiOutput.jpg') with the added advantage of not failing on file stream opening when appending to my xml file.

Any suggestion would be great.

Re: Transforming txt file with header and footer into 1 xml file

Another question
Does Talend support JSON output files that append data? The same way as the tAdvancedFileOutputXML does?
One Star

Re: Transforming txt file with header and footer into 1 xml file

Hello, to do what you're trying is actually quite simple with the introduction of a new component. We'll start off with a tFileInputMSDelimited - the multiple schema delimited file component and configure it with our three schema. Now, once you've done this - make sure you've got the tHashInput and tHashOutput components turned on (project settings -> designer -> palette settings -> move "Technical" to the right). Put three of the tHashOutput components on to the job designer white board and connect the row links from the tFileInputMSDelimited to them. Now, add three tHashInputs to the job designer white board and find a single tFileOutputMSXML and place it on there as well. Configure each of the tHashInputs to work with one of the tHashOutputs - same schema and chosen from the dropdown. Now connect them to the tFileOutputMSXML. Now double click on the tFileOutputXML to bring up the editor and for each of the three rows, select to "import an XML Tree" and import your XML sample you need the output to conform to. For your header row, remove the other two loop elements and populate the attributes. Repeat this for the other two inputs as well - mapping them to their specific subnodes. I've attached some images that will show you what all of this should look like, including the output.
Hope that answers your question.

Re: Transforming txt file with header and footer into 1 xml file

Works like a charm with the added benefit of not having to add a <root> element in my output XML file. In my original solution, Talend was forcing me to keep the <root> element (the name 'root' could be something else, but the element is mandatory).
I knew Talend can do it better. It's good to know that there's a Technical palette for HashInput and HashOutput components. They're very helpful.
Thank you rpbaldwin for this awesome solution and palette hint.

Also, I've tried loading a file with 20k lines of prices for performance testing. My original solution takes around 18 seconds for the transformation. As with this new solution, it barely takes less than 10 seconds! Major performance improvement here!
Now I will see if I can do the same with an output in JSON format.


Re: Transforming txt file with header and footer into 1 xml file

I believe in JSON, I have no choice but to output three different JSON files. There is no tOutputMSJSON available in Talend.
I'd be happy to write Java/Groovy code to append my three JSON files, but we're about to write at least 50 jobs like the one above. Replicating this code through these jobs would be nonsense. A tOutputMSJSON component is the right path to go, if it existed of course.
One Star

Re: Transforming txt file with header and footer into 1 xml file

Hi everyone!
I'm developing a lot of services to transform text files and xml files to JSON. Unfortunatly, I'm stuck at this same point, and I need a component that appends a complex JSON strutured file, like a "tOutputMSJSON".
Anybody found a solution for this problem?
Thanks for any help!
Rafael Trein


Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables


How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration


APIs for Dummies

View this on-demand webinar about APIs....

Watch Now