One Star

connect to HAWQ database

Hi,
How can i connect to HAWQ database? is it using tgreenplumbulkexec or using thive. I just need a simple scenario like 1)setup connection 2)create a table or update data in existing table
i have created the job as this tgreenplumconnection ->tgreenplumpgload it failed with error cant find gpload.exe file
the other scenario is tgreenplumconnection ->tgreenplumbulkexec it shows force not null columns must have a value
can someone help me on this.
thank you in advance

8 REPLIES
Four Stars

Re:connect to HAWQ database

Hi Sunil,
From where you got this HAWK database? Even google couldn't find it... Can you pl tell me more about HAWK?
Thanks
Vaibhav

One Star

Re:connect to HAWQ database

Hawk is part of EMC's Hadoop ecosystem that is unique to EMC. It provides a high degree of sql standard compliance. The benefits of Hadoop without having to completely retrain relational folks. It is not Greenplum, it is a sort of greenplum wrapper around HDFS which may be why the greenplum connectors are not working. I have never used it, just seen some demos.
http://www.emc.com/about/news/press/2013/20130225-04.htm

Four Stars

Re:connect to HAWQ database

Did you manage to connect to HAWQ using Talend?

One Star

Re:connect to HAWQ database

yes, i was able to connect to HAWQ. Are you still facing the issue? what is the error?

One Star

Re:connect to HAWQ database

I used tGreenplumGPLoad component to load data in Hadoop. But the job is bit tricky one.
You can configure tGreenplumGPLoad in Talend. Can fire query on that DB. But you cannot load directly. Because this component utilises the parallel loading feature of Greenplum. Greenplum parallel loading requires gpfdist (using external table loading feature) or a wrapper called gpload. Here in this case gpload is used (in the Advanced settings tab of the tGreenplumGPLoad component). Problem is gpload is a python script. So generally we run it in Unix environment. If you are able to create an executable gpload.exe from it, then you can run it in Windows.
My suggestion is to export the job in Unix environment. Unzip the job folder there. You'll see <jobname>.sh inside it. You can run that shell script there. There it will run on the gpload.py script. So to do that I would suggest make context for all the connection string of the databases you have in Talend job and for the YAML control file (in the Advanced settings tab of the tGreenplumGPLoad component). Save the job, export and then run the script only.
If you are still facing an issue, let me know. I can upload images also.
Happy Learning...

Four Stars

Re:connect to HAWQ database

We ended up creating our own GreenplumpGPLoad component that accepts the column descriptors and creates the YAML automatically.

One Star

Re:connect to HAWQ database

Can you please share the component that will create the YAML file automatically 
Thanks
Maya

One Star

Re:connect to HAWQ database

How did you get GPLoad component to work at all? I have downloaded the gpload tools for windows from Pivotal and now I can get my gpload component to successfully ping my server and then nothing more - It doesn't even try to load data from a file. If I give it a non-existent file, table or schema it does not complain one bit - When I run the GPLoad component this is what I get:
connected
Setting up PATH for Greenplum Loaders
LOADERS environment variables configured successfully.
disconnected
To specifiy - I set the gpload path in advanced settings to this (otherwise it 'cannot find file specified'):
C:/Programfile.../greenplum-loaders-.../greenplum_loaders_path.bat
So what on earth do I do to get it to actually load a file? I have my own created .yml file as I used to run gpload manually, but I would highly prefer to get talend to do it (so I know my .yml file and gpload tools work as I have used them manually for the same data, table, etc etc).