One Star

Parse multi schema XML message received from tJMS endpoint

Greetings all,
This new-be has been been struggling all afternoon how to parse a multi schema XML message received from a tJMS endpoint.
What I want to accomplish is to parse and loop through the different schemas in this XML and perform INSERT statements (preferably transactional for the complete XML message) to an Oracle database into several tables.
I tried to import the XSD into Metadata using the File XML import wizard. First thing I noticed is the XML file import wizard does not support multi schema XML/XSD files (for the target schema, I can only specify a single XPath).
When I would create for each schema a separate definition in the Metadata (some work, but not a problem), I do not understand how I could use these separate definitions to "split" the incoming tJMS XML message so I can start mapping the XML onto the tables.
Next, I have been looking at the *MSXML* components, but for some reason they are only available for file based transports (or I just cannot find the option to switch from file to flow like I can with the tXSDValidator component for instance).
Anyone can help me out pointing me to the components available in TOS-ESB?
Thanks in advance,
Robin Huiser
6 REPLIES
Employee

Re: Parse multi schema XML message received from tJMS endpoint

Hi Robin,
sounds to me that the tXMLMap would be the right component for your case (even so I not really knolw your XML Schema)
So try to
1) put the entire tJMS message into a single 'Document' Talend Schema Column. (Document = XML Data within the studio)
2) Import the XSD of your JMS Message to the 'File XML' or directly use it in tXMLMap (right click on the 'payload' / root element to specify your JMS Message (XML Schema) for the left hand side (input) of the tXMLMap
3) Define for each Oracle Table an output flow for the tXMLMap and mapp the input to the related output.
(you might also need to play a bit around with the 'All at once' option (in the header of the left hand / right hand tXMLMap mapping windows) but essentially by using a combinatio nof tJMSxxx, tXMLMap and tOracle you should get it done (for sure it depends a bit on how complex you XSD is but you can also use multiple tXML sequentially to maybe easier split and map parts of your XML Input message to the right Talend Schema elements to finally use these as Column values for your tORACLExxxx database inserts.
HTH,
Dietmar
One Star

Re: Parse multi schema XML message received from tJMS endpoint

Thanks, sounds good, will get started!
One Star

Re: Parse multi schema XML message received from tJMS endpoint

Hello rhuiser - how did you finally resolve this? I have a similar situation.
I had built the job using a single XML file to start with. I used the tFileInputMSXML component, created multiple schemas for each node or table (my XML is pretty complex with tons of loops). Then I joined the outputs from the previous step using a tMap and got my final output.
Now I need to integrate this job with an upstream JMS Queue. The challenge is that the queue spits out hundreds of XML 'files' in each stream; I need to loop through the stream, get each XML 'file' and pass to my job.
I would like to know how you ended up splitting your file, and how my approaches fares in relation to yours.
Thanks.
One Star

Re: Parse multi schema XML message received from tJMS endpoint

For illustration of my dilemma on how best to split my stream coming from the queue, here's how I could describe the issue:
<?xml version="1.0" encoding="UTF-8"?>
<PurchaseOrder>
<POHeader id="8965447">
<PONumber>9010999T</PONumber>
<PaymentInfo>
<Terms>
<Discount unit="percent">0</Discount>
<Days>25</Days>
</Terms>
</PaymentInfo>
</POHeader>
</PurchaseOrder>
<?xml version="1.0" encoding="UTF-8"?>
<PurchaseOrder>
<POHeader id="321488">
<PONumber>9010449T</PONumber>
<PaymentInfo>
<Terms>
<Discount unit="percent">5</Discount>
<Days>29</Days>
</Terms>
</PaymentInfo>
</POHeader>
</PurchaseOrder>
I'd like to parse my huge file and stream out just one "XML" at a time (such that it starts from the <?xml version=... and end on the "</PurchaseOrder>" tag). The question is how do I iterate through the file? I could probably get creative with the tMap component and add a counter that increments when every "<?xml version="1.0" encoding="UTF-8"?>" tag is encountered. And then I could in the next step filter out all the rows using this key.
Would this be the best approach?
How did you end up feeding data from your queue into a database?
Thanks!
Will
One Star

Re: Parse multi schema XML message received from tJMS endpoint

Robin,
I dove straight into my problem without addressing your original question... I had already done that piece using the tXMLMap (What Dietmar described).
What you do is to create as many 'reads' from the same file for particular nodes in your XML, then join them to the top-most or root node in your XML within the tXMLMap, just like you'd do in the tMap component. (See first screenshot attached).
On the question of how to process multiple 'XML's from the queue; one approach was to process each XML as it came from the queue; the problem is, I feared that the queue drain rate would exceed my XML parsing rate, so it would create a bottleneck. So I opted to drain the queue first into a delimited file, then process it in bulk. Now when the queue is drained, I have a file that is not a valid XML file top to bottom - as described earlier. I have repeating XML declarations between each transaction and I'm missing an opening and closing root tag.
What I did was to read the file top to bottom, on each row, check whether or not there was an XML declaration, and used it to remove all except the very first declaration, add an opening <POS> root tag, and finally add a closing </POS> tag. See screenshots... Works perfectly and fast...
Here are the variables in the 3rd screenshot:
varXMLTag
(row3.message != null && row3.message.length() > 0) ?
(
(StringHandling.LEFT(row3.message,5).equals("<?xml")) ? 1 : 0
) : 0
varXMLRowCounter
Var.varXMLTag == 1 ? Numeric.sequence("XMLsequence", 1, 1) : Numeric.sequence("XMLsequence",1,0)
varRowCounterByBatch
context.XMLFileBatchSize > 0 ?
(int)(((double)Var.varXMLRowCounter) % ((double)context.XMLFileBatchSize)) : 0

varBatchCounter
context.XMLFileBatchSize > 0 ?
(int)Math.ceil(((double)Var.varXMLRowCounter) / ((double)context.XMLFileBatchSize)) : 0
varMessage
(row3.message != null && row3.message.length() > 0) ?
(
Var.varXMLTag == 1 ? row3.message + "<POS>" : row3.message
) : " "
Filter:
// only allow first row with XML tag, and all subsequent rows that do not have XML tags
row1.XMLTag == 1 && row1.XMLRowCounterByBatch == 1 ||
row1.XMLTag == 0 && row1.XMLRowCounterByBatch >= 1
One Star

Re: Parse multi schema XML message received from tJMS endpoint

Hi willm,
I am exactly facing the similar situation, will you please explain the role of context.XMLFileBatchSize
in your job and logic to load it.
Thanks and Regards,
S.Arasu