Four Stars

Split XML file into multiple XML files based on info in tags

I'm using TOS Big Data and I'm trying to read in an XML file and separate out some of the data into different XML files based on some of the info.  As an example if I have an XML file with books in it, I want some books to go to one file and others to go to a different file based on the ISBN defined in a child element of the book.  I've been looking for info on how to do this easily, but haven't come across anything yet.  Any suggestions?

  • Big Data
  • Data Integration
4 REPLIES
Eleven Stars

Re: Split XML file into multiple XML files based on info in tags

This isn't necessarily too hard, but then again it isn't necessarily that easy. This all depends on the input and output schemas. We will need a lot more info before we can help. 

Rilhia Solutions
Four Stars

Re: Split XML file into multiple XML files based on info in tags

I can't share a specific example because it's customer data, but the general idea of what I want to do is to evaluate a tag within an XML file (like Book) and have it and all elements under it go to different output XML files based on the data in an element.  I've attached a sample XML for this example of doing this with books.

 

Logic might be something like:

If genre="Computer" then send the book (and child elements) to output XML file 1

if genre="Fantasy" then send the book (and child elements) to output XML file 2

Eleven Stars

Re: Split XML file into multiple XML files based on info in tags

OK, it is quite hard to give a detailed answer based on this. However if your input structure does not contain more than 1 looping section, it may be quite easy to achieve this using a tXMLMap component. If you are dealing with multiple looping sections, you will have to use the tExtractXMLField component. This requires a bit of knowledge of XPath queries, but is far more powerful than a tXMLMap. With the tExtractXMLField component you would need to use a tMap to send the data in different directions.

 

To build the XML (after extracting and sending the data to the relevant path), you will probably end up using a tXMLMap component. However if your output XML is complicated (has more than 1 looping section), you may need to do something a little more complicated when building this.

By the way, I am assuming you are using the Open Source Edition of Talend. If so, the above stands. If you are using the Enterprise Edition then you will have access to the Talend Data Mapper. This is much more powerful at working with XML BUT will require a training course. If you purchased the Enterprise Edition, chances are you will have purchased some training. If so, make sure you tale the TDM training.

Rilhia Solutions
Four Stars

Re: Split XML file into multiple XML files based on info in tags

Thanks for the suggestions.  I'll look into tExtractXMLField to see if that can help.