I have an XML document which contains data nested at various levels. As a result of this, I am unable to create a single tExtractXMLField component - instead I need 3+ of them, setting the xPath query loop as needed to get the object data I want.
I am using a tMap component to split the original XML document into multiple paths. Imagine the first path is for "Accounts", and the second path is for "Contacts", that need to be associated to an Account.
The first path is set to UPSERT (update if exists, insert if it doesn't) accounts to a Salesforce system. The second path is set to UPSERT contacts to Salesforce. But before it can UPSERT these contacts, the job needs to query Salesforce to retrieve the Record Id of the Accounts that were just created in the first path.
I intend to query Salesforce for these Account Record Id's by using a tSalesforceInput component, but I don't know how to make this query fire AFTER the Account UPSERT path completes, but BEFORE the Contact UPSERT starts.
Can someone help describe how to re-merge these paths from tMap and set a dependency between them?
There are a couple of ways to solve this problem, but since you appear to have the first part sorted (collecting your data from the XML), I'll go with this suggestion. You need to use the tHashOutput and tHashInput components. You cannot merge data once it has been split (in the method you describe) within the same subjob. In order to get round this, once you have split and processed your data, save it using tHashOutput components (three in your case) at the end of your processing within the subjob. Then start a new subjob and connect these three data sources using tHashInput components and a tMap.
Documentation on the tHashInput can be found here: https://help.talend.com/reader/wDRBNUuxk629sNcI0dNYaA/LTqBBurBnnkWIPVTY769dA
The tHashOutput component documentation can be found here: https://help.talend.com/reader/wDRBNUuxk629sNcI0dNYaA/0sI~h1CP_FvEiRphSMQ17Q
Hi - thanks for your suggestion. After learning that I could not merge the outputs of tMap anymore, I decided to write the XML payload to a file, and just read it in multiple times using tFileInputXML over multiple subjobs. This allows me to set the dependencies I need between subjobs, so it solves the problem.
This is not an efficient approach. A trick that might be useful is that you can use multiple tExtractXMLField components one after the other AND (the really useful bit) you can pass values from the first through the subsequent ones AND (an even more useful bit) you can extract sub-xml sections. So, as an example. If you start with XML that looks like this....
<MYCDData> <ID> <Name>Richard</Name> <Age>39</Age> </ID> <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE>10.90</PRICE> <YEAR>1985</YEAR> </CD> <CD> <TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST> <COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY> <PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD> </CATALOG> </MYCDData>
You can extract the data from here and produce 2 rows of data looking like this....
Name, Age, Title, Artist, Country, Company, Price, Year
Richard, 39, Empire Burlesque, Bob Dylan,USA, Columbia, 10.90, 1985
Richard, 39, Hide Your Heart, Bonnie Tyler, UK, CBS Records, 9.90, 1988
....using two tExtractXMLField components one after the other. The first one is used to extract the <ID> data and the both the <CD> nodes (as nodes and not individual records). The ID data is retrieved as Name and Age and is accompanied by 2 CD nodes. This multiples the rows by 2 (one for each of the CD sections). The next tExtractXMLField is used as a pass-through for the Name and Age fields (just create the columns in the schema and set nothing in the tExtractXMLField component for those) and the XML you are passing into this component is the column that holds the CD node. This is treated as a separate XML doc and you can extract the Artist, Title, etc from there. This data is passed out along with the ID data that has passed through it, to return the data as you see it above.
This is a much more efficient way of dealing with XML data. While it can be quite confusing at first, you quickly start to think of it like working with multiple tables in a RDMS. After all, XML is essentially a mini database in a flat file when you think about it.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn about modern data engineering in the Cloud
Learn how to deploy Talend Jobs as Docker images to Amazon, Azure and Google Cloud registries
Learn how to publish your API Services to Talend Cloud