Split XML file into multiple XML files based on info in tags

Six Stars

Split XML file into multiple XML files based on info in tags

I'm using TOS Big Data and I'm trying to read in an XML file and separate out some of the data into different XML files based on some of the info.  As an example if I have an XML file with books in it, I want some books to go to one file and others to go to a different file based on the ISBN defined in a child element of the book.  I've been looking for info on how to do this easily, but haven't come across anything yet.  Any suggestions?

Community Manager

Re: Split XML file into multiple XML files based on info in tags

This isn't necessarily too hard, but then again it isn't necessarily that easy. This all depends on the input and output schemas. We will need a lot more info before we can help. 

Six Stars

Re: Split XML file into multiple XML files based on info in tags

I can't share a specific example because it's customer data, but the general idea of what I want to do is to evaluate a tag within an XML file (like Book) and have it and all elements under it go to different output XML files based on the data in an element.  I've attached a sample XML for this example of doing this with books.

 

Logic might be something like:

If genre="Computer" then send the book (and child elements) to output XML file 1

if genre="Fantasy" then send the book (and child elements) to output XML file 2

Community Manager

Re: Split XML file into multiple XML files based on info in tags

OK, it is quite hard to give a detailed answer based on this. However if your input structure does not contain more than 1 looping section, it may be quite easy to achieve this using a tXMLMap component. If you are dealing with multiple looping sections, you will have to use the tExtractXMLField component. This requires a bit of knowledge of XPath queries, but is far more powerful than a tXMLMap. With the tExtractXMLField component you would need to use a tMap to send the data in different directions.

 

To build the XML (after extracting and sending the data to the relevant path), you will probably end up using a tXMLMap component. However if your output XML is complicated (has more than 1 looping section), you may need to do something a little more complicated when building this.

By the way, I am assuming you are using the Open Source Edition of Talend. If so, the above stands. If you are using the Enterprise Edition then you will have access to the Talend Data Mapper. This is much more powerful at working with XML BUT will require a training course. If you purchased the Enterprise Edition, chances are you will have purchased some training. If so, make sure you tale the TDM training.

Six Stars

Re: Split XML file into multiple XML files based on info in tags

Thanks for the suggestions.  I'll look into tExtractXMLField to see if that can help.

Calling Talend Open Studio Users

The first 100 community members completing the Open Studio survey win a $10 gift voucher.

Start the survey

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog