Four Stars

only 1 row with null values flowing through my tPigLoad components

Hi all,

 

I am using :

- Talend Open Studio for Big Data 6.4.1 on Windows 10 pro (x64)

- with Hadoop v2.0 on the cloud, provided by IBM demo cloud and powered by redhat6 x86_64

 

My job should normally use 2 tPigLoad components to import 2 tables from HDFS (1 main and 1 ref), do the lookup mapping with tPigMap, and export 2 tables (results and rejects) in a new HDFS directory with tPigStoreResult. 

 

It is running without ending, while the data flow ends on the design panel with only 1 row processed with null values (cf. please see attachment). The output directory is not created.

 

Also, I get 2 warnings :

 

[WARN ]: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[WARN ]: org.apache.pig.PigServer - Empty string specified for jar path

 

Looking at Java debugging perspective doesn't give more info ...

 

Would you please have an idea of what is going on ?

 

Best regards, Sélim

11 REPLIES
Moderator

Re: only 1 row with null values flowing through my tPigLoad components

Hello,

What does your tPigMap component setting look like? Did you follow up the online scenario about:TalendHelpCenter: Scenario: Joining data about road conditions in a Pig process ?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

posting does not work or is it not visible until validated by moderators ? ...

Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

Hi Sabrina,

Thank you for your reply. I have done the job you mentioned (cf. capture 1 in .zip file), very similar to what I did before by the way. My connection to the hadoop cluster seems to work (cf. capture 2). You can find the setting of my tpigload, tpigmap, tpigstoreresults components (capture 3, 4, 5). I also set the JVM arguments for winutils (found here : https://jira.talendforge.org/browse/TBD-1412), as you can see in capture 6.

Thanks in advance for any help,
Sélim

Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

test

Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

Hi Sabrina,

 

Thank you for your reply. I have done the job you mentioned (cf. capture 1 in .zip file), very similar to what I did before by the way. My connection to the hadoop cluster seems to work (cf. capture 2). You can find the setting of my tpigload, tpigmap, tpigstoreresults components (capture 3, 4, 5). I also set the JVM arguments for winutils, as you can see in capture 6.

 

Thanks in advance for any help,
Sélim

Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

the attachments

Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

 
Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

The results is the same, with only 1 null raw flowing through my job (see text file). And now I also have a java.lang.UnsatisfiedLinkError exception ...

Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

Sorry for the many split messages. I had a hard time simply posting (I think putting a web address made the post invalid or something) 

Moderator

Re: only 1 row with null values flowing through my tPigLoad components

Hello,

From the error message you posted, here is a jira issue:https://jira.talendforge.org/browse/TBD-2462

Are you able to use tPigLoad and tPigStoreResult to read data from HBase and to write them to HDFS successfully without lookup?

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: only 1 row with null values flowing through my tPigLoad components

Hi Sabrina,

 

unfortunately the HBase RegionServer Process does not repond from 2 of my data nodes and my last data node is not responding any hearttbeat at all. I have let a message to IBM which provides the cloud as a service cluster, but I see no answer from them in other topics of the forum. Maybe for the moment I could keep the PigStorage option ?

 

I have tried without lookup but I get the same error ...

 

Good evening