I'm trying to capture XML response data from an API, apply some masking, and then write the file to disk whilst maintaining the original structure.
Originally I had a tRESTClient -> tMapXML as this allows me to import the schema and then map it to the same target (with masking applied). However, this solution does not support grouping over multiple input loops so I've had to find another approach.
I read online that to overcome this limitation we should use tAdvancedXMLOutput as this supports multiple input loops. However, this component requires a Linker source to do the mapping whereas my input is simply a Document type and therefore has no fields that I can map! It's almost like I need a combination of both of these components but I can't find any solution that works.
I thought that I might have to save the file locally and then re-import it (which I really want to avoid) but even that doesn't seem valid as I would haven't to either read the XML to a tabular structure (via tXMLInput) or read back into memory as a DOM object. Which is the whole problem I'm having in the first place!
This could help
Could you please share your current job flow screen shot along with component screen shot where the conversion is happening?
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)
Please find my screenshots attached. Here's a further explanation:
talend_call_api - This job is metadata driven i.e. I have several parent tRunJobs that execute this job in parallel. They pass down their parameters via Context variables e.g. API_ENDPOINT. This job then calls a RestAPI via a loop as there are limits to how many records we can return in a single request. The loop executes repeatedly until all records have been returned by storing each result XML into memory using HashOutput. This approach is schema agnostic and can be used for any API as the XML is stored as a Document type.
talend_xml_doc - This screenshot simply confirms the Document type that is being used for storing the XML response via the Body field. The other flow is simply used to give the loop the next navigation link to iterate.
talend_xml_map - This is the component I'm having problems with. I capture the XML response via a schema-specific flow i.e. for each XML response I have a specific SubJob that applies a schema against it. My goal is very simple. I want to map an exact copy of the input schema to an exact copy of the output schema. This is so that I can mask fields that are sensitive. The problem however is that I can't apply multiple group / loop levels within this component and therefore my output XML becomes corrupted. For example, NatureTypes and Statuses are both elements that I need to loop and group by the top level application element. This functionality is simply not available in this component. You can only group / loop for one child node.
So then, hopefully this clarifies my problem. I want to read an XML and then apply a schema against it that generates an exact output replica of the XML. This just isn't possible with the tXMLMap component due to the limitations stated above. I also mention that other components e.g. tAdvancedXMLOutput are not suitable as they require a predefined tabular schema as input to the Linker Source. Obviously my XML is a Document type and therefore cannot be used as input to this component.
Please can somebody help?
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema
Pick up some tips and tricks with Context Variables