Spark Job fails on Talend Cloud+Remote Engine with 'com.esotericsoftware.kryo.KryoException: Buffer underflow' error

Problem Description

The Spark Job (tHiveInput > tUniqRow > tMap_1 > tMap_2 > tFileOutputDelimited) runs fine using the Local and Remote Engine connections in Studio.

 

However, publishing the same Job from Studio to Talend Cloud, and running the Job from Cloud WebUI + Remote Engine, the Job fails with the following exception:

org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: Buffer underflow. Serialization trace: org$apache$spark$storage$BlockManagerId$$topologyInfo_ (org.apache.spark.storage.BlockManagerId) 
org$apache$spark$scheduler$CompressedMapStatus$$loc (org.apache.spark.scheduler.CompressedMapStatus) 
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) 
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

 

Root Cause

Compare the compiled Job's .sh files, as shown in step 1 and 2 below:

  1. Build the Job in Studio, by right-clicking the Job and selecting Build Job. Open the .sh file. The -libjars contains a lot of JAR files.

    -libjars ../lib/hadoop-aws-2.6.0-cdh5.13.0.jar,../lib/hadoop-yarn-client-2.6.0-cdh5.13.0.jar,../lib/talend-bigdata-aws-1.0.0-20180116.jar,../lib/commons-httpclient-3.1.jar,../lib/commons-logging-1.2.jar,../lib/commons-compress-1.4.1.jar,../lib/hadoop-annotations-2.6.0-cdh5.13.0.jar,../lib/hadoop-yarn-common-2.6.0-cdh5.13.0.jar,../lib/hadoop-auth-2.6.0-cdh5.13.0.jar,../lib/stax-api-1.0-2.jar,../lib/leveldbjni-all-1.8.jar,../lib/dom4j-1.6.1.jar,../lib/zookeeper-3.4.6.jar,../lib/jackson-core-asl-1.9.13.jar,../lib/hadoop-mapreduce-client-core-2.6.0-cdh5.13.0.jar,../lib/slf4j-log4j12-1.7.16.jar,../lib/activation-1.1.1.jar,../lib/paranamer-2.6.jar,../lib/htrace-core4-4.0.1-incubating.jar,../lib/api-asn1-api-1.0.0-/lib/apacheds-kerberos-codec-2.0.0-M15.jar,../lib/commons-io-2.4.jar,../lib/antlr-runtime-3.5.2.jar,../lib/hadoop-mapreduce-client-shuffle-2.6.0-cdh5.13.0.jar,../lib/commons-net-3.1.jar,../lib/hadoop-client-2.6.0-cdh5.13.0.jar,../lib/hadoop-yarn-server-common-2.6.0-cdh5.13.0.jar,../lib/commons-digester-1.8.jar,../lib/jaxb-api-2.2.2.jar,../lib/servlet-api-2.5.jar,../lib/curator-client-2.7.1.jar,../lib/hadoop-mapreduce-client-app-2.6.0-cdh5.13.0.jar,../lib/log4j-1.2.17.jar,../lib/talend-dataflow-spark2-lib-6.0.0-20160502.jar,../lib/jackson-mapper-asl-1.9.13.jar,../lib/avro-1.7.7.jar,../lib/jackson-jaxrs-1.9.2.jar,../lib/gson-2.2.4.jar,../lib/commons-lang-2.6.jar,../lib/snappy-java-1.1.2.6.jar,../lib/jdiff-1.0.9.jar,../lib/jetty-util-6.1.26.cloudera.4.jar,../lib/hadoop-mapreduce-client-common-2.6.0-cdh5.13.0.jar,../lib/log4j-1.2.16.jar,../lib/commons-configuration-1.9.jar,../lib/talend-mapred-lib.jar,../lib/commons-beanutils-1.9.2.jar,../lib/xz-1.0.jar,./aaa_xxxx_yyy_test_0_1.jar
  2. On Remote Engine server, build the Job by right-clicking the Job and selecting Publishing to Cloud in Studio. Open the .sh file. The -libjars is missing a lot of JAR files.

    -libjars ../lib/routines.jar,./aaa_xxxx_yyy_test_0_1.jar

This is a bug in Studio.

 

Solution

To resolve this issue, request the Patch_20180828_TPS-2639_v1-7.0.1.zip file from Talend Support.

Version history
Revision #:
4 of 4
Last update:
‎04-10-2019 01:46 PM
Updated by: