How to run Talend Standard job as a JAR over a Spark cluster?

Highlighted
Six Stars

How to run Talend Standard job as a JAR over a Spark cluster?

Hi,
I have created one Talend Standard job and I want to take it out as a JAR file, so that will run it over spark cluster by using spark-submit.

And i'm doing this because cannot find a way to handle and apply transformation over frequently changing schema in Talend Spark job.

So, is there any way we can take the standard job as jar file and run it over cluster by spark-submit
Highlighted
Seven Stars

Re: How to run Talend Standard job as a JAR over a Spark cluster?

Hi @Rajesh

You can Build that Job, and it will create a '.bat file' and '.sh file', schedule a Job run with both '.bat file' or '.sh file' and can use to access JAR File as well.

Regards 
Meet Mahajan

Highlighted
Ten Stars

Re: How to run Talend Standard job as a JAR over a Spark cluster?

Yes you can ... however... your job isn't optimized for Spark, therefore within the java code it has to use the spark libs to fully use the cluster processing power.

// an example
import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row;

Its not like hey... kick this job/jar to Spark and here we go! Sorry.

Talend components and code-generation need to be adjusted.

If you could use Spark components... that would make more sense. Or have a Talend Job which has some (Spark)Python code, you submit this python code to Spark...I would opt for this.

 

Spark is designed for distributed computing. If you want to use multithreading (infinite cpus) therefore the Talend job needs to bee designed / developed for multihreading so Spark can spin up containers/executors. However... multihreading sounds nice but doesnt always make sense keep the overhead in mind and also skew (partitioning example 70% is null within a column) .

 

I hope this helps. 

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Talend Cloud Developer Series - Introduction

The Talend Cloud Developer Series was created to give you a solid foundational understanding of Talend’s Cloud Integration Platform

Watch Now

Talend Cloud Available on Microsoft Azure

An integration platform-as-a-serviceto help enterprises collect, govern, transform, and share data from any data sources

Watch Now

Self-service Talend Migration: Moving from On-Premises to the Cloud

Move from On-Premises to the Cloud by following the advice of experts

Read Now