Four Stars

Facing Issues with Spark Context Initialization Using Spark Big Data Batch Job

We have been facing severe issues with connecting to Cloudera Cluster from Talend Big Data Spark Job. We have been getting this error:

Our job is being submitted to spark, but we are wondering if we are missing any spark configuration parameters  missing from Talend end.

Talend version using: 6.3.1

Cloudera Version: 5.12

Any suggestions would be of great help.

 

Thank you

 

Starting job test_spark at 01:42 24/08/2017.

[statistics] connecting to socket on port 3728
[statistics] connected
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Talend/6.3.1/Talend-Studio-20161216_1026-V6.3.1/Talend-Studio-20161216_1026-V6.3.1/workspace/.Java/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Talend/6.3.1/Talend-Studio-20161216_1026-V6.3.1/Talend-Studio-20161216_1026-V6.3.1/workspace/.Java/lib/talend-spark-assembly-1.6.0-cdh5.8.1-hadoop2.6.0-cdh5.8.1-with-hive.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[WARN ]: org.apache.spark.SparkConf - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
[WARN ]: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at big_data.test_spark_0_1.test_spark.runJobInTOS(test_spark.java:1487)
    at big_data.test_spark_0_1.test_spark.main(test_spark.java:1374)
[WARN ]: org.apache.spark.metrics.MetricsSystem - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at big_data.test_spark_0_1.test_spark.runJobInTOS(test_spark.java:1487)
    at big_data.test_spark_0_1.test_spark.main(test_spark.java:1374)
Exception in thread "main" java.lang.RuntimeException: TalendJob: 'test_spark' - Failed with exit code: 1.
    at big_data.test_spark_0_1.test_spark.main(test_spark.java:1384)
[ERROR]: big_data.test_spark_0_1.test_spark - TalendJob: 'test_spark' - Failed with exit code: 1.

 

 

  • Big Data
1 ACCEPTED SOLUTION

Accepted Solutions
Four Stars

Re: Facing Issues with Spark Context Initialization Using Spark Big Data Batch Job

Hi xdshi ,

 

Yes , it is a spark batch job.

My Cluster is configured correctly . Here I have attached screen shot for reference .

I am able to run map-reduce job using this cluster configuration (I am using cloudera distribution.)

I am facing issue when trying to run Spark batch job.

I have attached screen shots which describe my job, spark configuration, error. Please help me out .

 

2 REPLIES
Moderator

Re: Facing Issues with Spark Context Initialization Using Spark Big Data Batch Job

Hello,

It is a spark batch job? Is your cluster correctly configured? Is our connection from the repository? More information will be helpful for to address your issue. Screenshots will be preferred.

Note: Please mask your sensitive data.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Facing Issues with Spark Context Initialization Using Spark Big Data Batch Job

Hi xdshi ,

 

Yes , it is a spark batch job.

My Cluster is configured correctly . Here I have attached screen shot for reference .

I am able to run map-reduce job using this cluster configuration (I am using cloudera distribution.)

I am facing issue when trying to run Spark batch job.

I have attached screen shots which describe my job, spark configuration, error. Please help me out .