I have a scenario in which i am running a job from Talend which is pulling all the files from AWS S3 folder.But before doing so ,i need to check if the Flag file is present there then only i can proceed.
Suppose ,I have files like this:
If the file_trigger.flag is present on AWS S3 then only my job starts pulling the files abc.csv ,bcd.csv etc...
I have made my job which is pulling all the files from S3 but now this is a new scenrio. Could you please let me know how to proceed with that?
you could install aws cli commandline client, and setup it
after use tSystm with command
aws s3 ls s3://bucket/File_trigger.Flag
tSystem will return 0 if file exists, and 1 if not, use it in tRunIf trigger as value
AWS CLI command line is installed already.
Still i need to connect to AWS before using tsystem command.I know the connection and other things.
But How tsystem will fetch AWS connection details and trigger the above command from Talend.
if you have installed aws-cli, you know - it self-contained, it do not need anything for work, you must configure it separate
so in you job:
preJob - connection
- and after RunIf all other
Below is another solution for your query. I have setup two sample files in S3 for testing.
Below is the output of the flow with only flag file as output.
Below are the details of each components. I have added tLogrow components for printing and you can remove them in actual code.
Both approaches provide same result but one is employing command line method and other through GUI.
If the reply has helped you, could you please mark the topic as resolved? Kudos are also welcome :-)
I exclude this variant because as is it requires double reading for all files (as on picture) and with 1000 files it could be long
but, if using the full Flag file name as a Key prefix in tS3List - it could reduce the number of iteration to 1
unfortunately, right now I have not S3 access and cannot test it, but it could be a best choice if work
@vapukov - Perfect idea !
I added the flag file name in the Key prefix and it reduced the number of iterations to 1.
The modified job flow is as below.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.