I need to copy data from S3 to EMR in AWS. Can someone please let me know which component I can use to write data to EMR? I am using Talend Studio as part of the Data Management Platform.
If we understand your requirement very well, you can use tS3Get component to retrieve a file from Amazon S3.
The work flow should be:tS3Connection-->tS3Get(retrieve files frm s3 to local)-->tfileunarchive(unzip your file)-->EMR cluster(amazon EMR). Let us know if it is Ok with you.
Thanks for the response.
We don't really have a need to download files locally. Is it possible to push data from S3 to EMR directly?
Also, in your proposed solution, which component handles the final (EMR cluster) step? The only EMR components that I see are "tAmazonEMRResize", "tAmazonEMRListInstaces", and "tAmazonEMRManage".
So far, talend don't support for transferring data by air. You have to download files locally and then push data to EMR
You can get Amazon EMR distribution from hadoop component.
Please take a look at my screenshot.
Thanks for the response. However, it looks like we need to have a subscription to one of the Talend solutions with big data and our subscription is for Talend Data Management Platform 6.2.1. So, does this mean we won't be able to connect to Amazon EMR using the components that we have access to?
The tHDFSOutput component can be available in talend open studio for bigdata.
So far, there is no specific component for AWS EMR Output.