Difference between Standard jobs and Big Data jobs in Talend

Four Stars

Difference between Standard jobs and Big Data jobs in Talend

Hi Folks,
Am new to Talend Big data edition, so trying to understand the difference between standard jobs and Big data jobs from a Extraction, Processing and Loading perspective. Please confirm if the below understanding is correct:
1. Below is a standard job to extract 20 millions records from one Impala table -> Clenase and -> Load into another Impala table
Extraction -> Happens querying the Hadoop cluster (Impala query processing) into Talend server
Processing -> The cleansing on the 20 million rows happen in Talend server
Loading -> The processed records are then inserted into Hadoop cluster (Bulk inserts)

2. Below is a big data job to  extract 20 millions records from one Impala table with a lookup -> Cleanse -> Load into another Impala table.
Extraction -> Happens querying the Hadoop cluster (Impala query processing) into Talend server
Processing -> The cleansing on the 20 million rows happen in Hadoop map reduce job and no data comes to Talend server
Loading -> The processed records are then inserted into Hadoop cluster (Bulk inserts)

3. The biggest difference in using processing components in Standard job and Big data job is that the data comes to Talend server for processing in a standard job and does not come to Talend server in a big data job.
Regards,
Vas
One Star

Re: Difference between Standard jobs and Big Data jobs in Talend

Hi Experts,
Could anyone please reply.
Six Stars

Re: Difference between Standard jobs and Big Data jobs in Talend

     The images you provided appear identical (both labeled Standard Job) so I'm not entirely sure of the differences, so I will summarize the current type of jobs as of 6.1.1 as I understand them:
1. Standard - This is just a java process. It can access many data sources via JDBC or HDFS,etc but the main process just executes in a JVM. 
2. BigData Streaming - This is for realtime-ish/microbatch stuff, the jobs are sent to Spark or Storm for execution. 
3. BigData Batch - Uses either MR1(namenode,etc)/YARN for MapReduce execution, or Spark for execution (With or without YARN depending on platform)
Hope that helps!
P.S. Imapla is it's own thing, so I would expect that part to behave about the same if on the same hardware. 
One Star

Re: Difference between Standard jobs and Big Data jobs in Talend

Hi,
Standard jobs are used for data ingestion, from different sources to your Hadoop cluster.
then you can use either big data streaming or big data batch job to do the processing using Mapreduce or Spark engines.
In your case, all the 3 steps should be in a big data batch. ( unless you're real time, you have to use the streaming )
any question, please let me know
Amine
One Star

Re: Difference between Standard jobs and Big Data jobs in Talend

Hello Experts , 
I am new to talend and working on Talend Big Data Platform version 6.1.1. I am trying to explore if I can use big data batch to read from s3 and write to rds by using Amazon EMR spark. 
Till now , my understanding is that the s3 components are available for a standard batch but not a big data batch, and I do understand that reading from s3 and writing to RDS is possible through a standard job, But I want to use the spark capabilities here for better performance and hence choosing a big data batch. Could anyone please clarify. 
Thanks in advance .
Aarti