[resolved] How to pick a file from S3 with latest date

Highlighted
One Star

[resolved] How to pick a file from S3 with latest date

Hi All,
For every 2 hours i used to get a new file in S3 and i have to take latest file depends on time from S3.
EX : My_File_20141104000001.csv
      My_File_20141104030001.csv
can some one help me how to implement this using talend.
Thanks in advance.
Rajesh

Accepted Solutions

Re: [resolved] How to pick a file from S3 with latest date

This is a very common task that is not super easy to implement in Talend. 
Please have a look at my example job below and let me know if this helps you, or if I can assist further Smiley Happy
To accomplish getting the newest file, we will get a list of files then get the properties for each of them. We will then sort the file properties by "mtime" or the last modified time and then grab the oldest for further processing. 
1) tFileList: this component is configured to look for files that start with my chosen string
2) tFileProperties: this component will retrieve the properties for each file. 
3) tBufferOutput: this component will store the file properties in memory so we can sort them once we've got info on all the files.
4) tBufferInput: this component will read from the buffer we populated with file property information
5) tSortRow: this component will sort the files by mtime descending (meaning the oldest file will be first in the list)
6) tSampleRow: this component is how we grab only the first row coming out of tSortRow

View solution in original post


All Replies
One Star

Re: [resolved] How to pick a file from S3 with latest date

Hi Rajesh
Let me first ensure, if I've captured your requirements correctly:
1. Your source folder is fixed.
2. You intend to run your job every 2 hrs.
3. On every execution, you wish to pick the latest file (irrespective of its name).
Your confirmation would help formulate a solution in a better way. Smiley Happy
MathurM
One Star

Re: [resolved] How to pick a file from S3 with latest date

Hi MathurM,
Thanks for your reply
1. Your source folder is fixed.
My Source folder is fixed
2. You intend to run your job every 2 hrs.
My job has to be run for every 2 hrs
3. On every execution, you wish to pick the latest file (irrespective of its name).
Always my file name will be same i,e (My_File) and my job has to pick only the file which starts with (My_File) depends upon latest date
Thanks
Rajesh

Re: [resolved] How to pick a file from S3 with latest date

This is a very common task that is not super easy to implement in Talend. 
Please have a look at my example job below and let me know if this helps you, or if I can assist further Smiley Happy
To accomplish getting the newest file, we will get a list of files then get the properties for each of them. We will then sort the file properties by "mtime" or the last modified time and then grab the oldest for further processing. 
1) tFileList: this component is configured to look for files that start with my chosen string
2) tFileProperties: this component will retrieve the properties for each file. 
3) tBufferOutput: this component will store the file properties in memory so we can sort them once we've got info on all the files.
4) tBufferInput: this component will read from the buffer we populated with file property information
5) tSortRow: this component will sort the files by mtime descending (meaning the oldest file will be first in the list)
6) tSampleRow: this component is how we grab only the first row coming out of tSortRow

View solution in original post

One Star

Re: [resolved] How to pick a file from S3 with latest date

Hi JohnGarrettMartin, I feel with your above solution, we kind of drifted away a bit from the original problem.
Hi Rajesh,
I would suggest you try an approach on the lines of the job shown below.
Here, 
1. We first create a start flag (assigning it a value, say 'T')
2. Using tFileList component, we iteratively extract all the files from the source folder. This component, itself allows us to sort the order of the files. We can sort the files on 'modified date', & also arrange them in 'ASC or DESC' order. In present case, we choose 'DESC.
3. Further on, we arrange to iteratively process each of the file based on a 'IF' condition i.e. the 'FLAG' equals 'T'
4. On successful processing of the file, on a 'OnSubjobOk' link we change the 'FLAG' to say 'F'.
5. As a result, after the successful processing of the first file, the flag would be changed from 'T' to 'F'. Hence, no-more fulfilling the 'IF' condition & no further files would be processed.
This way, we can achieve the processing of only the latest file in the source folder on every execution.
hope this helps. Smiley Happy
MathurM
Four Stars

Re: [resolved] How to pick a file from S3 with latest date

Hi,
Do you have rights to move file from s3 bucket to another folder?
if yes, then once the files are processed, move it to archive folder, this is much simpler than implementing work arounds...
Vaibhav
One Star

Re: [resolved] How to pick a file from S3 with latest date

Hi Smiley Happy

Can I get assistance from this solution? I am currently working on the same issue (to picking up data from s3 bucket based on the latest file.

Six Stars

Re: [resolved] How to pick a file from S3 with latest date

Hi Mathur,

 

Thanks for the advice. I know it has been five years since your post but can you please add the tJava code or screen shot of the tjava component that you have use to select the latest file.

 

Thanks,

 

T.A

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog