For every 2 hours i get a new JSON file in a S3 bucket and i have to take latest modified file so that i can map it the relevant sql table for output. The named of the file differ as they are generated depending on the day they are processed.
EX : Fri Mar 09 2018 11:22:54 GMT+0000 (UTC).json
can some one help me how to implement this using Talend.
Thanks in advance.
To accomplish getting the newest file, we will get a list of files by using tS3get then get the properties for each of them. We will then sort the file properties by "mtime" or the last modified time and then grab the oldest for further processing.
1) tFileList: this component is configured to look for files.
2) tFileProperties: this component will retrieve the properties for each file.
3) tBufferOutput: this component will store the file properties in memory so we can sort them once we've got info on all the files.
4) tBufferInput: this component will read from the buffer we populated with file property information
5) tSortRow: this component will sort the files by mtime descending (meaning the oldest file will be first in the list)
6) tSampleRow: this component is how we grab only the first row coming out of tSortRow
Let us know if it is OK with you.
Watch the recorded webinar!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.