Retrieving files in a S3 bucket using the latest modified date

Five Stars

Retrieving files in a S3 bucket using the latest modified date

Hi All,
For every 2 hours i get a new JSON file in a S3 bucket  and i have to take latest  modified file so that i can map it the relevant sql table for output. The named of the file differ as they are generated depending on the day they are processed. 

 

EX : Fri Mar 09 2018 11:22:54 GMT+0000 (UTC).json

       Wed Mar 14 2018 10:09:15 GMT+0000 (UTC).json


can some one help me how to implement this using Talend.
Thanks in advance.

Naledi

Moderator

Re: Retrieving files in a S3 bucket using the latest modified date

Hello,

To accomplish getting the newest file, we will get a list of files by using tS3get then get the properties for each of them. We will then sort the file properties by "mtime" or the last modified time and then grab the oldest for further processing. 

1) tFileList: this component is configured to look for files.
2) tFileProperties: this component will retrieve the properties for each file. 
3) tBufferOutput: this component will store the file properties in memory so we can sort them once we've got info on all the files.
4) tBufferInput: this component will read from the buffer we populated with file property information
5) tSortRow: this component will sort the files by mtime descending (meaning the oldest file will be first in the list)
6) tSampleRow: this component is how we grab only the first row coming out of tSortRow

modifieddate.png

Let us know if it is OK with you.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars LI
Four Stars

Re: Retrieving files in a S3 bucket using the latest modified date

What if your files on S3 are large?

It's unrealistic to pull all files locally and then get properties.

Ideally you could use tS3List to get the modified date as a param and then decide using this to which Key to pull down locally?

 

It's a shame as there is an tFTPFileProperties too, nothing for S3.

 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now