'Yarn application has already ended' SparkException when launching a Spark Big Data streaming Job

Talend Version       6.3.1

Summary

 
Additional Versions  6.4.1
Product Big Data
Component Components
Problem Description

A Spark Big Data streaming Job consists of consuming data from a Kafka topic (a tKafkaInput component) and writing them into HBase (tHBaseConfiguration / tHBaseOutput components). The Job is shown below:

Streamjob.png

 

When executing, a Spark Streaming Job fails with the following exception:

[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:874)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81)
at org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:140)
at bdadvanced_spark.streamingestion_0_1.StreamIngestion.runJobInTOS(StreamIngestion.java:972)
Problem root cause The tHDFSConfiguration component is not present, but is needed inside a Spark Big Data Job (either a batch Job or streaming Job).
Solution or Workaround Ensure that tHDFSConfiguration is present in the Job. If it is not, add the tHDFSConfiguration component related to the Hadoop Cluster accessed by the Spark Job.
JIRA ticket number  
Version history
Revision #:
5 of 5
Last update:
‎09-29-2018 12:11 AM
Updated by:
 
Tags (1)