Talend for Big Data: Scheduling JOB remotely - SSH connection

One Star

Talend for Big Data: Scheduling JOB remotely - SSH connection

Hi all,
I´m pretty new with Talend for Big Data and Hadoop in general.
I installed on my WIN server Talend for Big Data 5,4 and I created one job that makes some query in Hive and then put data into MySQL.
Before starting the job I need to start manually a plink (using Putty, I created a .bat file) to create the SSH connection to the Hadoop management node.
Then if I run the job it works as expected.
But I have two questions regarding this:
- I think that it´s possible to include the .bat file to start the SSH connection at the beginning of the job
but how can I close the SSH connection once the job is completed?
- In order to schedule the job, is it ok to use the same approach of TOS for Data integration (usually I export the jar and I schedule it in the WIN Task Scheduler) or should I use Oozie and schedule it on HDFS?

Thanks in advance
Mary
One Star

Re: Talend for Big Data: Scheduling JOB remotely - SSH connection

Hi guys,
do you have any update for me? :-)
Thanks
Mary
Four Stars

Re: Talend for Big Data: Scheduling JOB remotely - SSH connection

Hi Mary,
On Q1 - the short answer to your question - add an 'exit' command to your batch file. 'exit' works for ssh, so I presume it would work for plink. As an alternative to even using a bat file (which is external to your job), you can move your bat file code to tSystem (a Talend component) and code it there.
On Q2 - since the job works on your local machine, deploying it locally and running using Windows Scheduler works fine. That's assuming your machine will be running when you need the job executed. Because your Hadoop server is probably always on, it would be better to run it there.
In either case, since you're going to be building more and more Talend jobs (I presume :-)), you want to decide on your architecture for managing Talend jobs. Since you don't have the enterprise version of Talend (which comes with the Admin Console), you probably want to set up QA and Prod Servers that have identical file structures for testing and putting your jobs in to production. With such a setup, you're not burdening your Hadoop servers (if you don't need to) and you have a separate 'Talend' deployment process that you can manage. Especially if you end up doing non-Hadoop-related Talend jobs.
Whatever you do, don't forget to add the path to the plink executable in your PATH / LIB_PATH (Linux) environment variable on any machine you run your plink-dependent-jobs on!
Four Stars

Re: Talend for Big Data: Scheduling JOB remotely - SSH connection

And here's another post on using oozie - some considerations to keep in mind, should they apply to your current job design... http://www.talendforge.org/forum/viewtopic.php?id=33432
One Star

Re: Talend for Big Data: Scheduling JOB remotely - SSH connection

Thanks a lot willm!
I´m going now to try your suggestions