Five Stars

Spark Streaming Job - Exception in thread "streaming-job-executor-0"

Hi everyone,


I'm trying to design a simple Spark streaming job on Sandbox - generating rows using rowgenerator and loading it to HDFS. Attached is a spark configuration. I executed the job but getting following error -


Exception in thread "streaming-job-executor-0" java.lang.Error: java.lang.InterruptedException
[ERROR]: org.apache.spark.scheduler.cluster.YarnScheduler - Lost executor 1 on Container marked as failed: container_1497799969116_0004_01_000002 on host: talend-cdh580.weave.local. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1497799969116_0004_01_000002
Exit code: 50


Detail log file is attached 


Can someone please help me this error? 


Re: Spark Streaming Job - Exception in thread "streaming-job-executor-0"


Could you please indicate on which build version you got this issue?

Best regards


Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Spark Streaming Job - Exception in thread "streaming-job-executor-0"

6.3, 1.7

Five Stars

Re: Spark Streaming Job - Exception in thread "streaming-job-executor-0"

Ok, this is weird... I did not change anything. Just restarted the VM and executed job again... Looks like it did execute and  produced the result. I can still see few warnings and errors in log file ( attached ). But HDFS folder is showing me output files... Need someone to help me understand whats going on here Smiley Happy Smiley Happy Smiley Happy image.png






Re: Spark Streaming Job - Exception in thread "streaming-job-executor-0"

hi, to try and answer your questions...


The first warning you are seeing is simply that - a warning.  The SparkUI tries to connect to default port 4040.  If it can't it keeps trying, 4041, 4042, etc...  until it either times out or connects.  You can avoid this by setting a property in the Spark Configuration of Talend Studio to a known open port.



As for the larger error, it looks like the job did start and processed 9 batches and failed on the 10th caused by a lost executor/container.  To understand the error better you would probably have to dig into the cluster logs (however, since you shut down the cluster container and/or VM, I believe all historical data is lost.  If you run into the issue again, you would have to pull the cluster log before shutting down.  But keep in mind, this is a sandbox environment and it was likely caused by resource contention.