One Star

tPigload failed with HCatLoader Load function

Hi,
tPigload fails to connect to Hive table with HCatLoader Load function, but retrieving the table via Pig console succeeds. Appreciate your help in regards to how to troubleshoot the issue.
Version of Talend Open Studio: 5.5.0
Hadoop: HDP 2.1 for Windows
1. retrieve table from Pig console - succeeded
a = LOAD 'Talend.weblog' using org.apache.hcatalog.pig.HCatLoader();
2. Use tPigload & tPigStoreResult components - failed
Job log
=====
User: molin
Name: BIGDATADEMO_test_0.1_tPigLoad_1
Application Type: MAPREDUCE
Application Tags:
State: FAILED
FinalStatus: FAILED
Started: 10-Jun-2014 10:37:42
Elapsed: 4sec
Tracking URL: History
Diagnostics: Application application_1402325418471_0006 failed 2 times due to AM Container for appattempt_1402325418471_0006_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
1 file(s) moved.
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
c:\hdpdata\hadoop\local\usercache\molin\appcache\application_1402325418471_0006\container_1402325418471_0006_02_000001>if 0 NEQ 0 exit /b 0
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
Trace log of Talend
============
connecting to socket on port 4033
connected
connecting to socket on port 4344
connected
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
: org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sql230:8020
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: hive.metastore - Trying to connect to metastore with URI thrift://sql230:9083
: hive.metastore - Connected to metastore.
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
: org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=}
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
: org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sql230/192.168.56.1:8032
: org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
: org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
: org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4686189630677206316.jar
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4686189630677206316.jar created
: org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
: org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
: org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sql230/192.168.56.1:8032
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.mapreduce.JobSubmitter - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
: org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
: org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
: org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
: org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
: org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1402325418471_0006
: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1402325418471_0006
: org.apache.hadoop.mapreduce.Job - The url to track the job: http://sql230:8088/proxy/application_1402325418471_0006/
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1402325418471_0006
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases tPigLoad_1_row1_RESULT
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: tPigLoad_1_row1_RESULT C: R:
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1402325418471_0006 has failed! Stop running all dependent jobs
: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
: org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
: org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0.2.1.1.0-385 0.12.1.2.1.1.0-385 molin 2014-06-10 10:37:36 2014-06-10 10:37:47 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1402325418471_0006 tPigLoad_1_row1_RESULT MAP_ONLY Message: Job failed! /user/hdp/weblog/test,
Input(s):
Failed to read data from "talend.weblog"
Output(s):
Failed to produce result in "/user/hdp/weblog/test"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1402325418471_0006

: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
disconnected
disconnected
Job test ended at 10:37 10/06/2014.
3 REPLIES
One Star

Re: tPigload failed with HCatLoader Load function

I find the error is exactly the same with as error described at http://www.srccodes.com/p/article/46/noclassdeffounderror-org-apache-hadoop-service-compositeservice...
However, I added the CLASSPATH property in yarn-site.xml according to the advice of the post, the error still remains. could anyone shed me a light?
thanks,
Mo
One Star

Re: tPigload failed with HCatLoader Load function

I think I could find the cause.
After install trial Enterprise studio, I am now able to trace into the job code. I found that that Talend studio ignores the classpath setting in ?yarn-site.xml? but always adds below classpath as the properties in the function tPigLoad_1Process() , which is actually wrong classpath for HDP Windows.
props_tPigLoad_1.put("yarn.application.classpath", "/etc/hadoop/conf,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*");
The advice/help I need
1. Is this a bug for tPigload component? Where can I report this issue?
2. The debug trace stops at ?pigServer_tPigLoad_1.executeBatch();? which tell me that ?Source not found? for the file PigServer.calss. where can I download the source of PigServer.calss so that I can continue trace to locate the problematic call?
Thank you,
Mo
Moderator

Re: tPigload failed with HCatLoader Load function

Hi Nemolin,
Could you please report a ticket on Talend Support Portal, so that our collgagues from support team will check if it is a bug with priority, through the support cycle.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.