Need Suggestion for Best Practice to choose Big Data Batch (with Spark) or Standard Job with tSqoop (with MR)

One Star

Need Suggestion for Best Practice to choose Big Data Batch (with Spark) or Standard Job with tSqoop (with MR)

Hi expert,

 

first of all, I haven't seen any Main Topic for Big Data Discussion, like in your old Forum. Only BD sandbox which currently available. So I decided to ask here.

Any of you can give better idea, which one should we choose when we want to do the Data Ingestion from RDBMS to HDFS/Hive.
Been thinking of these 2 ways, please give the idea which one is better (or any other ways better):

1. In Standard Job: tSqoopImport --component ok--> tHiveLoad
OR
2. In Big data batch Job (Spark) : tXXXInput (RDBMS, such as Oracle/mssql/etc)  --main Job--> tFileOutputDelimited (to put to the HDFS) --> Load to Hive from HDFS
or maybe any of you have any better solution?

Huge thanks

Moderator

Re: Need Suggestion for Best Practice to choose Big Data Batch (with Spark) or Standard Job with tSqoop (with MR)

Hi,

You can import data from RDBMS to hadoop using sqoop without using tHiveLoad
Please take a look at a related scenario in component reference about:TalendHelpCenter:tSqoopImport
Best regards
Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Need Suggestion for Best Practice to choose Big Data Batch (with Spark) or Standard Job with tSqoop (with MR)

Hi @xdshi, thanks for the reply.

Yes currently I'm using tSqoopImport and brings the data to HDFS, and since the destination is to Hive, so I used tHiveLoad.

My main confusion is, with the same scenario (RDBMS source to Hive), If I'm not mistaken, it is also possible to use Big data Batch Job, which can perform the task using Spark framework. The flow components more or less will be like the statement in point number 2

So back again to the question, which would be faster? (or the best practice)
Using big data standard job for ingestion from rdbms.
Or use Big Data Batch Job using Spark Framework?
CMIIW.

Thanks

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now