How to edit files in S3 bucket using talend

Five Stars

How to edit files in S3 bucket using talend

Hi all,

 

My current scenario is uploading a file to S3 bucket, apply some transformations once the file is available in S3 bucket and reload the file to same S3 bucket with new name. 

I'm able to upload the file to S3 bucket, but not able to read the file and apply simple transformation in the job. 

Question 1: Is it possible to edit a txt or csv file which is in S3 bucket by a talend job. 

Question 2: If yes how the job design needs to be modified. 

Please look into the screen shot of my job design.

 

Regards,

SS


Accepted Solutions
Employee

Re: How to edit files in S3 bucket using talend

Hi,

 

     My current understanding is that you are currently processing a single source file which you want to copy to S3. Also you would like to do modifications on this data and store the modified file also back to S3. In this scenario, you need only tS3Put to load the data to S3 after your modifications.

 

     However, if there are multiple files from source which you would liek to move to S3, you will have to use a tFileList to do the iteration before processing each file to S3.

 

     Now, if the situation you are having is different where S3 is the source file location, then you will have to bring that file to local before making the modification using other Talend components. Once the modification is complete, you can push the file back to S3 bucket using tS3Put.

 

Warm Regards,

 

Nikhil Thampi

View solution in original post


All Replies
Employee

Re: How to edit files in S3 bucket using talend

Hi,

 

     Could you please remove the tS3List_1 and use the file you are passing as source of tS3Put_1 itself as the source for transformation as the next subjob. Once the transformation is complete, you can push the modified file also to S3 using tS3Put.

 

     Below is the skeleton diagram of the process.

image.png

 

 

If you are having more than one file to be processed, you can add these components to a subjob and pass the file name as parameter to the child job. Then you can call the child job in iterative fashion till all your files from source folder has been processed successfully.

 

If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome :-)

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: How to edit files in S3 bucket using talend

Hi Nikhil,

The proposed solution cannot be achieved as the ts3put component will be having only the source location of the file, and the tfileinputdelimited needs to fetch the source file location as it cannot be pointed out to any local directory(as the file is in S3). To fetch the file location i need to use ts3list to read the files and get the file name to process it. 

So my question is whether talend is able to transform or cleanse some minor business requirements inside S3 bucket and move the file to S3 bucket. So that the final transformed or cleansed data can be moved to cloud DB for further processing.    

Employee

Re: How to edit files in S3 bucket using talend

Hi,

 

     My current understanding is that you are currently processing a single source file which you want to copy to S3. Also you would like to do modifications on this data and store the modified file also back to S3. In this scenario, you need only tS3Put to load the data to S3 after your modifications.

 

     However, if there are multiple files from source which you would liek to move to S3, you will have to use a tFileList to do the iteration before processing each file to S3.

 

     Now, if the situation you are having is different where S3 is the source file location, then you will have to bring that file to local before making the modification using other Talend components. Once the modification is complete, you can push the file back to S3 bucket using tS3Put.

 

Warm Regards,

 

Nikhil Thampi

View solution in original post

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Talend Cloud Developer Series – Deploying First Job to Cloud

This video will show you how to run a job in Studio and then publish that job to Talend Cloud

Watch Now

Talend Cloud Developer Series – Fetching Studio License

This video will help someone new to using Talend Studio get started by connecting to Talend Cloud and fetching the Studio License

Watch Now

Talend Cloud Developer Series - Introduction

The Talend Cloud Developer Series was created to give you a solid foundational understanding of Talend’s Cloud Integration Platform

Watch Now