How do I Download a series of XML/JSON files, and merge into a table?

One Star

How do I Download a series of XML/JSON files, and merge into a table?

Hi Everyone
I've spent many an hour in Talend for other things, but can't seem to crack this and was hoping people could help.
I have a CSV file containing a single column of numbers, and these numbers determine the URL of the data I need to download. These numbers will change regularly, so Talend needs to read this CSV. It might look like:
123
345
567
Talend should iterate through this, downloading the relevant data from discrete URL's. The URL's will look something like:
https://www.eventbrite.co.uk/xml/event_list_attendees?id=123
https://www.eventbrite.co.uk/xml/event_list_attendees?id=345
https://www.eventbrite.co.uk/xml/event_list_attendees?id=567
The XML in this is not complex, just an XML representation of a basic spreadsheet, effectively.
Once downloaded, these separate data files (metadata and structure are the same for all of them) should be merged into one flat file, so I can output this as a single large CSV file.
Finally, I should add that when downloaded the data I can specify whether the data is in XML format or JSON. I am happy to use whatever people think will be easier.
Can someone explain how I would do this? Much appreciated
thanks
stony
One Star

Re: How do I Download a series of XML/JSON files, and merge into a table?

Hi Stony
I create a job as your demand. You may get more details as the following images.
Finally, I should add that when downloaded the data I can specify whether the data is in XML format or JSON.

I download all data without file type.
If you use tFileInputXML, it will read XML data and ignore JSON data automatically.
If you use tFileInputJson, it will only read Json data.
Best regards!
Pedro
One Star

Re: How do I Download a series of XML/JSON files, and merge into a table?

Thanks, Pedro, worked like a charm!
PM me your paypal email address and I'll send you a six-pack's worth of beer money.
For anyone else who tries this, I got stuck for a while with tFileList iterating, but tFileInputXML doing nothing. I had a mess around with the Property type and Schema type, and in the end set both to Repository (after duplicating and tweaking the repository of the XML).
I also added a second CSV file as the reject file of tFileInputXML - though I don't know if this is necessary as this hasn't had any errors yet.
Finally, if you start getting weird messages about a dot being required in the domain name, make sure you have set Save Cookies to checked for tFileFetch - that worked for me.
One Star

Re: How do I Download a series of XML/JSON files, and merge into a table?

Hi
Set 'File Name/Stream' of tFileInputXML like this: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
If you still encounter any error, please let me know and show me more details.
Best regards!
Pedro
One Star

Re: How do I Download a series of XML/JSON files, and merge into a table?

Hi Pedro
Thanks again, it's looking good.
I got stuck when I saved the data using tFileOutputDelimited, because it looked like of the many files/rows read, only the rows from just one file were saved. I checked "Append" and that fixed it, all files / rows read were saved to CSV format.
This meant that every time I reran my job, the CSV got another set of duplicate rows. Is the easiest way around this to run tFileDelete to clear / reset the previous run's data?
I include a screenshot of how I currently have it setup.
thanks
stony
One Star

Re: How do I Download a series of XML/JSON files, and merge into a table?

Hi
I create a new job to fit your request.
In short, you have to generate a lookup file to save the records which you have used to fetch files and get the reject rows to fetch new files.
Then use tFileDelete to delete the directory which you save the fetch files.
Look at the upload image.
Subjob one: tFileInputDelimited_2 is to input the lookup date. In tMap1, you shold inner join two input flow and get the reject rows.
Subjob two: This is aim at appending the used records to the lookup file.
Subjob four: Delete the directory which you save the fetch files.

Best regards!
Pedro