I am using :
- Talend Open Studio for Big Data 6.4.1 on Windows 10 pro (x64)
- with Hadoop v2.0 on the cloud, provided by IBM demo cloud and powered by redhat6 x86_64
My job should normally use 2 tPigLoad components to import 2 tables from HDFS (1 main and 1 ref), do the lookup mapping with tPigMap, and export 2 tables (results and rejects) in a new HDFS directory with tPigStoreResult.
It is running without ending, while the data flow ends on the design panel with only 1 row processed with null values (cf. please see attachment). The output directory is not created.
Also, I get 2 warnings :
[WARN ]: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[WARN ]: org.apache.pig.PigServer - Empty string specified for jar path
Looking at Java debugging perspective doesn't give more info ...
Would you please have an idea of what is going on ?
Best regards, Sélim
What does your tPigMap component setting look like? Did you follow up the online scenario about:TalendHelpCenter: Scenario: Joining data about road conditions in a Pig process ?
Thank you for your reply. I have done the job you mentioned (cf. capture 1 in .zip file), very similar to what I did before by the way. My connection to the hadoop cluster seems to work (cf. capture 2). You can find the setting of my tpigload, tpigmap, tpigstoreresults components (capture 3, 4, 5). I also set the JVM arguments for winutils (found here : https://jira.talendforge.org/browse/TBD-1412), as you can see in capture 6.
Thanks in advance for any help,
Thank you for your reply. I have done the job you mentioned (cf. capture 1 in .zip file), very similar to what I did before by the way. My connection to the hadoop cluster seems to work (cf. capture 2). You can find the setting of my tpigload, tpigmap, tpigstoreresults components (capture 3, 4, 5). I also set the JVM arguments for winutils, as you can see in capture 6.
Thanks in advance for any help,
The results is the same, with only 1 null raw flowing through my job (see text file). And now I also have a java.lang.UnsatisfiedLinkError exception ...
Sorry for the many split messages. I had a hard time simply posting (I think putting a web address made the post invalid or something)
From the error message you posted, here is a jira issue:https://jira.talendforge.org/browse/TBD-2462
Are you able to use tPigLoad and tPigStoreResult to read data from HBase and to write them to HDFS successfully without lookup?
unfortunately the HBase RegionServer Process does not repond from 2 of my data nodes and my last data node is not responding any hearttbeat at all. I have let a message to IBM which provides the cloud as a service cluster, but I see no answer from them in other topics of the forum. Maybe for the moment I could keep the PigStorage option ?
I have tried without lookup but I get the same error ...
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to make your data more available, reduce costs and cut your build time
Read about OTTO's experiences with Big Data and Personalized Experiences
Take a look at this video about Talend Integration with Databricks