Facing Issues with Spark Context Initialization Using Spark Big Data Batch Job

Four Stars

Facing Issues with Spark Context Initialization Using Spark Big Data Batch Job

Good afternoon,

 

I am getting the following error log:

My cluster is well configured. Anyone who can help me?

 

Starting job testAvro at 15:16 02/11/2017.

[statistics] connecting to socket on port 3464
[statistics] connected
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/0martinjr/.Java/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/0martinjr/.Java/lib/talend-spark-assembly-1.6.0-cdh5.8.1-hadoop2.6.0-cdh5.8.1-with-hive.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[WARN ]: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[WARN ]: org.apache.spark.SparkConf - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
 at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
 at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
 at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
 at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
 at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
 at poc_sebastien.testavro_0_1.testAvro.runJobInTOS(testAvro.java:1291)
 at poc_sebastien.testavro_0_1.testAvro.main(testAvro.java:1172)
[WARN ]: org.apache.spark.metrics.MetricsSystem - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
 at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
 at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
 at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
 at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
 at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
 at poc_sebastien.testavro_0_1.testAvro.runJobInTOS(testAvro.java:1291)
 at poc_sebastien.testavro_0_1.testAvro.main(testAvro.java:1172)
Exception in thread "main" java.lang.RuntimeException: TalendJob: 'testAvro' - Failed with exit code: 1.
 at poc_sebastien.testavro_0_1.testAvro.main(testAvro.java:1182)
[ERROR]: poc_sebastien.testavro_0_1.testAvro - TalendJob: 'testAvro' - Failed with exit code: 1.
Job testAvro ended at 15:38 02/11/2017. [exit code=1]

 

Thanks in advance,

 

sebastien1981

Four Stars

Re: Facing Issues with Spark Context Initialization Using Spark Big Data Batch Job

We are using Talend 6.3, CDH 5.9, and spark 1.6.

 

cluster version:
Property Type: Repository : HDFS:testSeb
Distribution : Cloudera; Version: Cloudera CDH5.8(YARN mode)
SPark Mode : Yarn Client

Configuration:
Resource manager: "xxxxxxx:8032"
set resourcemanager scheduler address : xxx:8030
set jobhistory address : xxxx: 10020
Set staging directory: "/user"


Authentication
Use kerberos authentication
Resource manager principal "xx"
job history principal: "xxx"

Use a keytab to authenticate
Principal "xxxxx" Keytab: "cheminendur"

 

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.