Extract Data from Oracle and Load Hive Tables using Batch Jobs

Six Stars

Extract Data from Oracle and Load Hive Tables using Batch Jobs

Hello Community,

 

I have a requirement where I need to extract data from the Oracle Database to csv files using Standard Job and post extraction these csv files would be loaded to Hive tables using Batch Jobs. Currently, all the csv files are under a FTP server location.

 

I'm quite new to Batch job concept. Could you please help me over here?

Employee

Re: Extract Data from Oracle and Load Hive Tables using Batch Jobs

Hi,

 

     The batch loading concept is almost simliar to DI batch load flow. But here, we are using Spark engine to drive the flow and the components will change according to Spark layout.

 

      Since you are using Hive to output the data, I would suggest to go through the Bigdata job properties for this component and also go through the sample job specified in the link below.

 

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/xd_IX0AdYKc3dTF9akwRVw

 

Warm Regards,
Nikhil Thampi

Please appreciate our members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Six Stars

Re: Extract Data from Oracle and Load Hive Tables using Batch Jobs

Hi @nikhilthampi,

 

Many Thanks for the prompt response!

 

I went through the url you've given below. However, my requirement is slightly different. As I'm currently using Talend Big Data Enterprise v6.3.1, I do not see any FTP component while creating Big Data Batch job (not sure if its available in v7.1.1) due to which I'm unable to proceed further as firstly, I have to extract the csv files from FTP server. 

 

Is there a way to handle this situation?

 

Thanks in advance!

 

Best Regards,

Dipanjan

 

Employee

Re: Extract Data from Oracle and Load Hive Tables using Batch Jobs

@dipanjan93

 

You should not use a Big data job for FTP processing. There should be a DI job to perform the FTP activity and once this job is complete, you can use a BD job to perform the further tasks within the BD job.

 

In the execution plan, you can orchestrate the DI and BD jobs in such a way that the DI and BD jobs will start one after another.

 

Warm Regards,
Nikhil Thampi

Please appreciate our members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)

Cloud Free Trial

Try Talend Cloud free for 30 days.

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.