Difference between Standard jobs and Big Data jobs in Talend

Highlighted
Four Stars

Difference between Standard jobs and Big Data jobs in Talend

Hi Folks,
Am new to Talend Big data edition, so trying to understand the difference between standard jobs and Big data jobs from a Extraction, Processing and Loading perspective. Please confirm if the below understanding is correct:
1. Below is a standard job to extract 20 millions records from one Impala table -> Clenase and -> Load into another Impala table
Extraction -> Happens querying the Hadoop cluster (Impala query processing) into Talend server
Processing -> The cleansing on the 20 million rows happen in Talend server
Loading -> The processed records are then inserted into Hadoop cluster (Bulk inserts)

2. Below is a big data job to  extract 20 millions records from one Impala table with a lookup -> Cleanse -> Load into another Impala table.
Extraction -> Happens querying the Hadoop cluster (Impala query processing) into Talend server
Processing -> The cleansing on the 20 million rows happen in Hadoop map reduce job and no data comes to Talend server
Loading -> The processed records are then inserted into Hadoop cluster (Bulk inserts)

3. The biggest difference in using processing components in Standard job and Big data job is that the data comes to Talend server for processing in a standard job and does not come to Talend server in a big data job.
Regards,
Vas
One Star

Re: Difference between Standard jobs and Big Data jobs in Talend

Hi Experts,
Could anyone please reply.
Seven Stars

Re: Difference between Standard jobs and Big Data jobs in Talend

     The images you provided appear identical (both labeled Standard Job) so I'm not entirely sure of the differences, so I will summarize the current type of jobs as of 6.1.1 as I understand them:
1. Standard - This is just a java process. It can access many data sources via JDBC or HDFS,etc but the main process just executes in a JVM. 
2. BigData Streaming - This is for realtime-ish/microbatch stuff, the jobs are sent to Spark or Storm for execution. 
3. BigData Batch - Uses either MR1(namenode,etc)/YARN for MapReduce execution, or Spark for execution (With or without YARN depending on platform)
Hope that helps!
P.S. Imapla is it's own thing, so I would expect that part to behave about the same if on the same hardware. 
One Star

Re: Difference between Standard jobs and Big Data jobs in Talend

Hi,
Standard jobs are used for data ingestion, from different sources to your Hadoop cluster.
then you can use either big data streaming or big data batch job to do the processing using Mapreduce or Spark engines.
In your case, all the 3 steps should be in a big data batch. ( unless you're real time, you have to use the streaming )
any question, please let me know
Amine
One Star

Re: Difference between Standard jobs and Big Data jobs in Talend

Hello Experts , 
I am new to talend and working on Talend Big Data Platform version 6.1.1. I am trying to explore if I can use big data batch to read from s3 and write to rds by using Amazon EMR spark. 
Till now , my understanding is that the s3 components are available for a standard batch but not a big data batch, and I do understand that reading from s3 and writing to RDS is possible through a standard job, But I want to use the spark capabilities here for better performance and hence choosing a big data batch. Could anyone please clarify. 
Thanks in advance .
Aarti

15TH OCTOBER, COUNTY HALL, LONDON

Join us at the Community Lounge.

Register Now

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now