I have an XML file structured (more or less) like this example (hopefully I made it clear enough, the real file is much longer and with many more elements/sub elements) with unbalanced hierarchies.
I wish to create *one* output table to contain this data.
Could you pls suggest the best method to do so (all solutions I found up till now suggested more than one output for this kind of situation)?
Thank you in advance.
The reason most suggestions point you to outputting to several tables is that an XML file is essentially a mini relational database. You can of course keep all records in a database in one table, but the data duplication will be horrendous. You should be able to use any of the examples you have seen previously, but join your extracted data so that it is formatted to one table. However, what you will get will be somewhat similar to the brief example below....
<notes> <note> <to> <address>firstname.lastname@example.org</address> <address>email@example.com</address> <address>firstname.lastname@example.org</address> </to> <cc> <address>email@example.com</address> <address>firstname.lastname@example.org</address> </cc> <from>Jani@jani.com</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> </notes>
If we represent this as you would expect it to be held in a database, you would get 3 tables; note, to and cc. This record would represent 1 "note" record, 3 "to" records and 2 "cc" records. If this were to be represented as a single "note" table, we either need lots of "to" and "cc" columns to handle many addresses, or we need to create a record for every permutation of data. This would mean 6 (1x3x2) note records (shown below)....
Jani@jani.com, Reminder, Don’t forget me this weekend!,email@example.com, firstname.lastname@example.org Jani@jani.com, Reminder, Don’t forget me this weekend!,email@example.com, firstname.lastname@example.org Jani@jani.com, Reminder, Don’t forget me this weekend!,email@example.com, firstname.lastname@example.org Jani@jani.com, Reminder, Don’t forget me this weekend!,email@example.com, firstname.lastname@example.org Jani@jani.com, Reminder, Don’t forget me this weekend!,email@example.com, firstname.lastname@example.org Jani@jani.com, Reminder, Don’t forget me this weekend!,email@example.com, firstname.lastname@example.org
You will notice the data duplication in these records.
Of course, if you are happy with this, then there is no reason why you shouldn't do it this way. There is no different way of extracting XML data in this way. The methods you have seen will work. But you will have to join the data back in memory in order to produce an output like above. To do that you will need to create "keys" for every section that contains loops. So in the example above, you would need a key for the note and that key would need to be passed to the looped data sets ("to" and "cc"). Then when all of your data has bee extracted, you can rejoin it using the keys.
Hi rhall 2 0,
Thank you for your answer. I would like to try your suggestion with the duplication, could you explain how to get there?
As I said, you can probably follow the examples you have already seen and then join the data back together. Essentially what you need to do is start with tExtractXMLField component and extract the outermost data (as column data) and each iteration of the outermost loops (as nodes...there is a tick box). Then you need to pass that data to subsequent tExtractXMLField components and extract the data from the loops you have extracted in previous tExtractXMLField components. A nice trick to remember is that you can pass data through tExtractXMLField components by setting up the relevant columns in the schema and leaving their configuration inside the tExtractXMLField blank. So, if you extract a "name" column in the first tExtractXMLField, you can pass that data through the next tExtractXMLField component by simply setting its schema up to hold a "name" column, without configuring its XPath. The other thing you may need to do (to join the data) is to extract a "key" value for the looped data. So (for example), if you have a record of "person" which has a looped section listing "telephone_numbers", when you extract the telephone numbers you will want to provide a key to the person the numbers belong to.
For multiple looping in XML you should use Talend Data Mapper for ease. Talend Data Integration licensed version provides the feature of TDM.