Four Stars

Copy data from AWS S3 to AWS EMR

Hello,

I need to copy data from S3 to EMR in AWS. Can someone please let me know which component I can use to write data to EMR? I am using Talend Studio as part of the Data Management Platform.

  • Data Integration
5 REPLIES
Moderator

Re: Copy data from AWS S3 to AWS EMR

Hi,

 

If we understand your requirement very well, you can use tS3Get component to retrieve a file from Amazon S3.
The work flow should be:tS3Connection-->tS3Get(retrieve files frm s3 to local)-->tfileunarchive(unzip your file)-->EMR cluster(amazon EMR). Let us know if it is Ok with you.
Best regards

Sabrina

Four Stars

Re: Copy data from AWS S3 to AWS EMR

Thanks for the response.

 

We don't really have a need to download files locally. Is it possible to push data from S3 to EMR directly?

 

Also, in your proposed solution, which component handles the final (EMR cluster) step? The only EMR components that I see are "tAmazonEMRResize", "tAmazonEMRListInstaces", and "tAmazonEMRManage".

Moderator

Re: Copy data from AWS S3 to AWS EMR

Hi,

So far, talend don't support for transferring data by air. You have to download files locally and then push data to EMR

You can get Amazon EMR distribution from hadoop component.

Please take a look at my screenshot.

Best regard

Four Stars

Re: Copy data from AWS S3 to AWS EMR

Thanks for the response. However, it looks like we need to have a subscription to one of the Talend solutions with big data and our subscription is for Talend Data Management Platform 6.2.1. So, does this mean we won't be able to connect to Amazon EMR using the components that we have access to?

Moderator

Re: Copy data from AWS S3 to AWS EMR

Hi,

The tHDFSOutput component can be available in talend open studio for bigdata.

So far, there is no specific component for AWS EMR Output.

Best regards

Sabrina