Five Stars

Greenplum - gpload process error

We have small greenplum cluster. In that , trying for Merge operation using tgreenplumGPload Component.
Getting this error.
ENV Details :

OS detail 
Talend server  - windows server 2012
Greenplum Cluster version -   centos 7    

Hadoop cluster - centos 7 







Getting following error :
Exception in thread "Thread-1" java.lang.RuntimeException: Cannot run program "gpload": CreateProcess error=2, The system cannot find the file specified

Attached is screenshot error :


Job flow Setting at tgreenplumGPload component


gpfdist program is running at the Greenplum master host. 
$ ps -A | grep gpfdist
20071 pts/0    00:00:00 gpfdist
$


Do i need to copy file from Local windows on which talend job is running  to REMOTE linux server on which greenplum database master exist ? It would be great help if you will suggest on my current  data flow.
 Current Data flow:
                                       tgreenplumconnection
                                      |
Read from SQL server -->hdfs -->tmap-->tgreenplumGPload -->tgreenplumCommit
Q1 : How do I get  source HDFS data into greenplum at  serving directory of gpfdist protocol. so, that gpload merge operation start using it. We cannot use gphdfs because purpose is gpload merge operation. Please suggest if we have any alternate way to do this.

Checked    -  following process is running in greenplum server .
$ gpload -f gpload.yml
2017-02-25 20:20:48|INFO|gpload session started 2017-02-25 20:20:48
2017-02-25 20:20:48|INFO|started gpfdist -p 8081 -P 8082 -f "/home/gpadmin/demo/gp_RevenueReport_stg0.txt" -t 30
2017-02-25 20:20:48|INFO|running time: 0.20 seconds
2017-02-25 20:20:48|INFO|rows Inserted          = 0
2017-02-25 20:20:48|INFO|rows Updated           = 3
2017-02-25 20:20:48|INFO|data formatting errors = 0
2017-02-25 20:20:48|INFO|gpload succeeded

Main cause :
Greenplum database server (Linux) is remote to ETL talend server (window). hence , when i am running the job from window server . ALSO,  i am not able to configure component tgreenplumGPload. 




Any help on it would be much appreciated ? Thanks in advance
3 REPLIES
Moderator

Re: Greenplum - gpload process error

Hi,
Could you please indicate on which build version you got this issue? Are you able to load the same file into GPDB by runnning the gpload utility from the command line?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Greenplum - gpload process error

Hello xdshi,
Yes, I am able to load the File from greenplum database server. also i  am able to  point to the external table from ETL host .
But  Not able to output data in target table.  Insert is failing from tgreenplumGPload.
Here is detail I posted with  screenshot : 
https://www.talendforge.org/forum/viewtopic.php?id=56114

 Env 
greenplum loader tool windows -4.3   -  gpload version 4.3.8.1 build 1
Python  - 2.5.4 -64 bit 
 Talend - Windows server 2012 r2

I am finding the way 
How to use tgreenplumGPload  when it is not inserting record. Even job is throwing no error. it is completing with exit cod=0  without error. 



Also when I ADDED THE breakpoint  on tgreenplumGPload component

Starting job gpload_test at 07:45 03/03/2017.


connecting to socket on port 4007
connected
Exception in thread "Thread-1" java.lang.RuntimeException: Cannot run program "gpload": CreateProcess error=2, The system cannot find the file specified
at bigdata.gpload_test_0_1.gpload_test$2.run(gpload_test.java:848)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
at java.lang.Runtime.exec(Runtime.java:620)
at java.lang.Runtime.exec(Runtime.java:528)
at bigdata.gpload_test_0_1.gpload_test$2.run(gpload_test.java:836)
disconnected
Job gpload_test ended at 07:45 03/03/2017.


Please ,I need Your direction on it. Thanks
Five Stars

Re: Greenplum - gpload process error

I noticed following from  ETL server. i am able to run this command from  Windows command prompt 
c\Smiley Embarassedgpload.py - f gpload.yml
output shows :
c:\>gpload.py -f gpload.yml
2017-03-04 06:03:07|INFO|gpload session started 2017-03-04 06:03:07
2017-03-04 06:03:07|INFO|started gpfdist -p 8081 -P 8082 -f "C:/gp_RevenueReport
_stg0.txt" -t 30
WARNING:  nonstandard use of \\ in a string literal
LINE 1: ...tg0.txt') format'text' (delimiter '|' null '' escape '\\' )
                                                                ^
HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.
2017-03-04 06:03:07|INFO|running time: 0.23 seconds
NOTICE:  table "temp_staging_gpload_25e5cb21_00ca_11e7_b077_bc764e20d911" does n
ot exist, skipping
2017-03-04 06:03:08|INFO|rows Inserted          = 20
2017-03-04 06:03:08|INFO|rows Updated           = 200
2017-03-04 06:03:08|INFO|data formatting errors = 0
2017-03-04 06:03:08|INFO|gpload succeeded


At the same time when i try to   Merge the data through tgreenplumGPload  - it is failing to merge the data into target  table