How can i connect to HAWQ database? is it using tgreenplumbulkexec or using thive. I just need a simple scenario like 1)setup connection 2)create a table or update data in existing table
i have created the job as this tgreenplumconnection ->tgreenplumpgload it failed with error cant find gpload.exe file
the other scenario is tgreenplumconnection ->tgreenplumbulkexec it shows force not null columns must have a value
can someone help me on this.
thank you in advance
Hawk is part of EMC's Hadoop ecosystem that is unique to EMC. It provides a high degree of sql standard compliance. The benefits of Hadoop without having to completely retrain relational folks. It is not Greenplum, it is a sort of greenplum wrapper around HDFS which may be why the greenplum connectors are not working. I have never used it, just seen some demos.
I used tGreenplumGPLoad component to load data in Hadoop. But the job is bit tricky one.
You can configure tGreenplumGPLoad in Talend. Can fire query on that DB. But you cannot load directly. Because this component utilises the parallel loading feature of Greenplum. Greenplum parallel loading requires gpfdist (using external table loading feature) or a wrapper called gpload. Here in this case gpload is used (in the Advanced settings tab of the tGreenplumGPLoad component). Problem is gpload is a python script. So generally we run it in Unix environment. If you are able to create an executable gpload.exe from it, then you can run it in Windows.
My suggestion is to export the job in Unix environment. Unzip the job folder there. You'll see <jobname>.sh inside it. You can run that shell script there. There it will run on the gpload.py script. So to do that I would suggest make context for all the connection string of the databases you have in Talend job and for the YAML control file (in the Advanced settings tab of the tGreenplumGPLoad component). Save the job, export and then run the script only.
If you are still facing an issue, let me know. I can upload images also.
How did you get GPLoad component to work at all? I have downloaded the gpload tools for windows from Pivotal and now I can get my gpload component to successfully ping my server and then nothing more - It doesn't even try to load data from a file. If I give it a non-existent file, table or schema it does not complain one bit - When I run the GPLoad component this is what I get:
Setting up PATH for Greenplum Loaders
LOADERS environment variables configured successfully.
To specifiy - I set the gpload path in advanced settings to this (otherwise it 'cannot find file specified'):
So what on earth do I do to get it to actually load a file? I have my own created .yml file as I used to run gpload manually, but I would highly prefer to get talend to do it (so I know my .yml file and gpload tools work as I have used them manually for the same data, table, etc etc).
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Learn how to make your data more available, reduce costs and cut your build time
Read about OTTO's experiences with Big Data and Personalized Experiences
Take a look at this video about Talend Integration with Databricks