How to load data from PostgreSQL to Hive

Highlighted
Five Stars

How to load data from PostgreSQL to Hive

I am new to Talend, and I've been tasked (database course project) with loading data from a database (postgresql) and transfer it to a data warehouse (hive) via ETL, we were suggested to use Talend. However, I'm not sure how to transfer data from pgsql to hive, since there is no tHiveInput component to "map" directly the data from pgsql to Hive. I've also tried converting the data from pgsql to a .csv file and try to load this data with a tHiveLoad component but this didn't work either because I'm unable to connect the tFileOutput component to a tHiveload component.

 

So I'm unsure on what to do. TLDR, not sure how to load data from pgsql to hive via talend.

Employee

Re: How to load data from PostgreSQL to Hive

Hi,

 

   One option is to read data from PostgresSQL and push it to HDFS layer using tHDFSOutput component.

 

    Then read the file using using a tfileinputdelimited in a Bigdata batch job and push it into Hive layer by tHiveOutput.

 

    There are other methods also but this is quite simple and straight forward method since you are doing Talend for first time.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Five Stars

Re: How to load data from PostgreSQL to Hive

Hi Nikhil,

I'm trying to implement your solution, what I have right now is simply my Postgres connection -> tHDFSOutput. However, I'm a bit confused on what to do next, should I connect my tHDFSOutput component to a tFileInput? Or is there another step in between HDFS and tFileInput?

 

Also, I was browsing my Hive components, and I don't have a tHiveOutput, not sure why.

Employee

Re: How to load data from PostgreSQL to Hive

Hi,

 

    You will have to create a separate BigData Batch job to do the rest. The HDFS file can be read by the tfileinputdelimited component in the Bigdata job. Once you create both these jobs separately, its a matter of orchestrating both by calling them one after another through a parent Talend standard job.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Two Stars

Re: How to load data from PostgreSQL to Hive

Do you know if this is possible working with Open Studio for Big Data, or is it only possible with the Big Data Platform?

 

I'm trying it from Open Studio for Big Data but there is no combo box to convert a job into Big Data Batch.

 

Thanks in advance.

Employee

Re: How to load data from PostgreSQL to Hive

Hi,

 

     I was referring like the belwo flow. You do not have to convert any standard job to Bigdata batch job.

image.png

 

image.png

 

image.png

 

Here you are calling the Bigdata batch job after loading the data to HDFS layer through standard job. And the jobs are called in a sequential fashion.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Five Stars

Re: How to load data from PostgreSQL to Hive

Hi Nikhil,

I'm trying to set my job as yours, but the problem is we don't have a tHiveOutput component. I've attached the options that Talend provides me when I drag the connection to my tFileInputDelimited -> tHiveOutput job. As you can see, there's not tHiveOutput component.

Employee

Re: How to load data from PostgreSQL to Hive

Hi,

 

   Could you please go to File -> Edit Project Properties and check whether the component has been added to Palette under Bigdata Spark jobs?

 

image.png

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Five Stars

Re: How to load data from PostgreSQL to Hive

Hi,

I don't have a Big Data Batch Job component anywhere. I'm using Talend Open Studio for Big Data 7.11, is Big Data Batch Job a premium feature or something?

Employee

Re: How to load data from PostgreSQL to Hive

Hi,

 

     Apologies. This feature is only available in Subscription version.

image.png

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now