Hello, I have to build a datawarehouse and an ODS with xml files as input source. I have the xsd file for the xml files. But I have a problem, all xml files have differents structures. They share the same set of tags but some xml have more tags and some have less tags and in a different order. I ll get an export file each week with a set of this xml file. How to parse them? (It is possible? (because each xml change order) For instance: First xml file: <student> <name>Paul</name> </student> <college>Mineapolis</college> Second file: <student> <name>Paul</name> <age>17</age> </student> Thank you for your help.
You can do this with tExtractXMLField components (https://help.talend.com/search/all?query=tExtractXMLField&content-lang=en) and a bit of XPATH knowledge. There is absolutely no requirement for all tags to be present and all you would have to do is ensure that you have covered every permutation of potential structures (...it sounds like the structure will be pretty consistent, just missing tags on different files). The complicated bit comes with loops and complex structures. If you think (as a rule of thumb) that you will need a tExtractXMLField component for every loop/complex structure type and ensure that the structure is pass out of one tExtractXMLField component as a Node to the next tExtractXMLField component, you should be able to work your way through this. It won't be easy the first time you do it, but you will learn a lot doing it.
I see. I hadn't thought of that. But now that I have, you could use a tfileInputRaw and read the data in as a String, then use a tConvertType to convert the String to a Document. Then you will have your XML Document inside the job and you can use the method it described above to get useful data from it.