When running Spark 1.6 on yarn clusters, i ran into problems, when yarn preempted spark containers and then the spark job failed. This happens only sometimes, when yarn used a fair scheduler and other queues with a higher priority submitted a job. After some research i found the solution: dynamic allocation.
Accessing preempted containers
In my understanding each container in spark handles temporary shuffle files on their own until a external shuffle service is used. The external shuffle service is also used, when dynamic allocation is active, which is nothing more than automatic resource handling on the cluster. When there is no external shuffle service and a yarn container used by spark is preempted, you can see something like this in the logs:
10:32:09 [user@host 2][ERROR] 16/07/08 03:32:09 ERROR cluster.YarnScheduler: Lost executor 26 on nodeXYZ: Container container_e15_1467912824020_1783_01_006762 on host: nodeXYZ was preempted.
10:34:52 [user@host 2][ERROR] java.io.IOException: Failed to send RPC 5259997239214050418 to nodeXYZ/10.101.173.174:36616: java.nio.channels.ClosedChannelException
As you can see the container gets first preempted and for a while everything seems ok, until some other node is trying to access something on the preempted container, which leads to an IOException. Finally the job will fail.
When Spark uses an external shuffle service, the service has control over all the temporary shuffle files and so containers can be removed safely, without running into these errors. To enable dynamic allocation you can look into the spark documentation.
Here is some further information and it seems like Spark 2.0 is not affected by these issue, so it would may be an option to wait for the Spark 2.0 release and try to fix it with the upgrade.