Spark Jobs fail with the errors: Diagnostics: Container killed on request. Exit code is 143 and Lost executor 3.
Cores, Memory, and MemoryOverhead are three things that you can tune to make a Job succeed in this case. Changing a few parameters in the Spark configuration file helps to resolve the issue.
The number of cores you configure (four or eight) is very significant, as it affects the number of concurrent tasks you can run. With four cores, you can run four tasks in parallel; this affects the amount of execution memory being used. The Spark executor memory is shared between these tasks. Here are the two relevant parameters:
Memory is important too. The number of cores, and the heap memory available, contribute to this parameter. Here are the two relevant properties:
memoryOverhead allows the container (the driver or executor(s)) to run until its memory reaches the memoryOverhead limit. Once the container exceeds the limit, it is generally killed by YARN. Here are the two relevant properties:
Increase the value of the following parameters according to your system capability:
spark.executor.cores spark.driver.cores spark.executor.memory spark.driver.memory spark.yarn.executor.memoryOverhead spark.yarn.driver.memoryOverhead