Spark jobs failing with errors: "Diagnostics: Container killed on request. Exit code is 143" and "Lost executor 3"

Issue

Spark Jobs fail with the errors: Diagnostics: Container killed on request. Exit code is 143 and Lost executor 3.

 

Diagnosis

Cores, Memory, and MemoryOverhead are three things that you can tune to make a Job succeed in this case. Changing a few parameters in the Spark configuration file helps to resolve the issue.

 

Cores

The number of cores you configure (four or eight) is very significant, as it affects the number of concurrent tasks you can run. With four cores, you can run four tasks in parallel; this affects the amount of execution memory being used. The Spark executor memory is shared between these tasks. Here are the two relevant parameters:

spark.executor.cores
spark.driver.cores

 

Memory

Memory is important too. The number of cores, and the heap memory available, contribute to this parameter. Here are the two relevant properties:

spark.executor.memory
spark.driver.memory

 

MemoryOverhead

memoryOverhead allows the container (the driver or executor(s)) to run until its memory reaches the memoryOverhead limit. Once the container exceeds the limit, it is generally killed by YARN. Here are the two relevant properties:

spark.yarn.executor.memoryOverhead
spark.yarn.driver.memoryOverhead

 

Resolution

Increase the value of the following parameters according to your system capability:

spark.executor.cores
spark.driver.cores
spark.executor.memory
spark.driver.memory
spark.yarn.executor.memoryOverhead
spark.yarn.driver.memoryOverhead
Version history
Revision #:
4 of 4
Last update:
‎09-29-2018 12:13 AM
Updated by:
 
Labels (3)