Extract Data from Oracle and Load Hive Tables using Batch Jobs

Six Stars

Extract Data from Oracle and Load Hive Tables using Batch Jobs

Hello Community,

 

I have a requirement where I need to extract data from the Oracle Database to csv files using Standard Job and post extraction these csv files would be loaded to Hive tables using Batch Jobs. Currently, all the csv files are under a FTP server location.

 

I'm quite new to Batch job concept. Could you please help me over here?

Employee

Re: Extract Data from Oracle and Load Hive Tables using Batch Jobs

Hi,

 

     The batch loading concept is almost simliar to DI batch load flow. But here, we are using Spark engine to drive the flow and the components will change according to Spark layout.

 

      Since you are using Hive to output the data, I would suggest to go through the Bigdata job properties for this component and also go through the sample job specified in the link below.

 

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/xd_IX0AdYKc3dTF9akwRVw

 

Warm Regards,
Nikhil Thampi

Please appreciate our members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Six Stars

Re: Extract Data from Oracle and Load Hive Tables using Batch Jobs

Hi @nikhilthampi,

 

Many Thanks for the prompt response!

 

I went through the url you've given below. However, my requirement is slightly different. As I'm currently using Talend Big Data Enterprise v6.3.1, I do not see any FTP component while creating Big Data Batch job (not sure if its available in v7.1.1) due to which I'm unable to proceed further as firstly, I have to extract the csv files from FTP server. 

 

Is there a way to handle this situation?

 

Thanks in advance!

 

Best Regards,

Dipanjan

 

Employee

Re: Extract Data from Oracle and Load Hive Tables using Batch Jobs

@dipanjan93

 

You should not use a Big data job for FTP processing. There should be a DI job to perform the FTP activity and once this job is complete, you can use a BD job to perform the further tasks within the BD job.

 

In the execution plan, you can orchestrate the DI and BD jobs in such a way that the DI and BD jobs will start one after another.

 

Warm Regards,
Nikhil Thampi

Please appreciate our members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now