Greenplum tGreenplumGPLoad - some help would be appreciated

One Star

Greenplum tGreenplumGPLoad - some help would be appreciated

Hi Folks,
First thing: newbie to Talend, love the product. Sensational stuff - will be using it a ton.
I am working with a Greenplum Server and want to load data on to the with the tGreenplumGPLoad component. I am having trouble understanding a few things:

1) I am running Talend 5.1.1 from a Separate Windows Server that is connecting to a file and then dumping the data in to a Greenplum 4.2.1 database. Is this allowed? Does Talend have to also be installed on Greenplum somewhere?
2) I am not clear on the Advanced settings.
a) The system seems to be looking for a "GPLOAD.exe" file if we are to specify the GPLoad path. I have spoken with EMC and no such file exists for Windows. Is this really asking for GPFDIST.exe? If there is a GPLOAD.exe file, could someone please point me to it?
b) Our Greenplum Server is UNIX box. It does have access to the Python file gpload.py. Can I use this instead?
I can actually create a job and it looks like it's working, but the data doesn't actually Insert, hence my dilemma. I also am noting that the documentation on using this doesn't include a ton of examples, so I am hopeful that I can work this out, as this component would be a game changer for us.
Thanks,
Sarsippius
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

Gpload.exe should be found in the loaders package for windows.
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

Thanks very much - I just installed the load package, and gpload.exe is not found there. GPFDIST.exe is however.
Any thoughts? Are you able to share your version of the installer package?
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

Also, just to note: I went through the trouble of converting gpload.py to a gpload.exe. I've used both gpfdist.exe and the new gpload.exe file without any luck. I've also set my Environment variables to accept .py as an executable, but this hasn't helped.
The connector:
1) Connects to the server
2) Creates a table (so I know it connects to the database)
3) Runs
4) Says it works
5) But does not insert
Does it make a difference that I am running Talend on my desktop? Do I need to run gpfdist locally prior to running the insert?
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

Did you figure this out? I seem to be having the same issue.
I don't have any GPLOAD.EXE. If I try GPLOAD.PY I get an error and if I try GPFDIST.EXE it seems to work and it will create a new table if I tell it to do that, but it doesn't load any data.
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

I was able to convert gpload.py to gpload.exe using http://py2exe.org
After doing that I changed my GPLOAD path within Talend to the new executable and the job no longer fails and it is loading data. So far it is hanging at the end of the execution though until I click Kill.
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

I have the same problem. I am loading 300000 records from Oracle into Greenplum using tGreenplumOutput. I have set parallel execution. The jobs runs fine and then hangs for a long time.
If i reduce the rows in the source to only 1000 then the job hangs after execution for about 1 minute 20 seconds and then completes successfully. The greater the rows in source the longer the job hangs after execution - and data is not committed to Greenplum in this duration. It is visible in the target only after the job completes successfully. My commit interval is 10000.
Is this behaviour normal, or am I making a mistake.
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

Did anyone got this working?
if not do anyone know of a "fast" (over 1.000 rows/s but ~10.000 would be great) way to feed data into GP with talend, because the GPoutput runs at ~150 rows/s which is just impossible.
One Star

Re: Greenplum tGreenplumGPLoad - some help would be appreciated

Hi all,
I have got this working 50% - Instead of selecting the 'gpload.exe' or 'gpfdist.py' you need to load the path variable (I think) that allows the use of the files stated above - This is done by selecting:
''/greenplum-loaders-directory/greenplum_loaders_path.bat''
If you're thinking ''well the path states it should be a .exe'', look just a few rows above in the component where it says ''use exisiting control file (YAML formatted)'' ie it tells you it will be a .yml file, but click this and it gives an example of a .ctl (which is I believe similar but useless for gpload: I ran gpload manually to test this).
Once you select the .bat path it will run gpload correctly, however as said only 50% - Mine is stuck in a state where, though it will run correctly, it will not even attempt to load a file of data to my database. I remove the file path entirely or spell it wrong purposfully and it does not complain - I even give it a non-existent schema or table name and it does not complain (it does at least notice bad passwords, usernames and database connections as it does still ping the server it seems, just nothing more than that).
So I realise it has been nearly 2 years since anyone has even tried to help you with this issue, but I hope I can be of some help and maybe vice versa as I am stuck as well.. I think I am heading in the right direction, but from here out I have no clue.
Regards,