Four Stars

Cloudera on AWS and Talend in Local system?

Hi,

 

I am planning to install Cloudera on AWS and Talend on Local system (Laptop).

Will I be able to successfully write or read files from AWS through TALEND?

 

Appreciate your help if someone share some articles or related documents.

 

Regards,

Prasad

2 REPLIES
Six Stars

Re: Cloudera on AWS and Talend in Local system?

Prasad, 

 

I had really good luck with the sandbox on virtualbox. I was able to read and write data on S3 in that. 

 

vhttps://www.talend.com/products/sandbox/

Employee

Re: Cloudera on AWS and Talend in Local system?

Hi Prasad

 

The short answer to your question is yes, you will be able to read and write files on S3 from Talend Jobs running on Cloudera, but there are some options you should consider first.

 

If you are starting your exploration with Talend, I would suggest following  Dustin's recommendation and checking out the Sandbox.  There is a separate Sandbox discussion group as well.  The Sandbox is a VM image which you can download and it has Talend and Cloudera pre-installed.  It also includes some very nice sample jobs.  It is good for exploring  Talend functionality, but of course it is not scalable nor does it run in the AWS per se.

 

If or when you are ready to explore Cloudera in AWS, you will need to be careful in addressing your configuration.  If you run Talend on your laptop  you will of course need to grant access to your Laptop to talk to your CDH cluster at the AWS Security Group level (just as you would with EMR) if you intend to run the jobs from Talend Studio on your laptop.  If you wish to run the jobs from a jobserver up in AWS, then you will need to grant access to the Jobserver in a similar manner.  In both cases, keep in mind that Talend jobservers run in Yarn client mode, and the Cluster nodes need to talk back to the Talend process.  If you ar running from Talend Studio on your laptop that means granting access from your AWS cluster through your own firewall, etc.  It can tricky pretty quick.  And arguably none of that complexity is what you really want to explore.

 

So I would suggest that if CDH is really critical to your exploration that you start with the Sandbox.  If CDH is not critical, then you can use the Quickstart which works with EMR to explore the same functionality.

 

Ed