pyspark - Spark Application Fails Every 50 Days – Driver Memory Shows 98.1 GB19.1 GB - Stack Overflow

admin2025-04-17  4

I am facing an issue where my Spark application fails approximately once every 50 days. However, I don’t see any errors in the application logs. The only clue I found is in the NodeManager logs, which show the following error:

WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_e225_1708884103504_1826568_02_000002 and exit code: 1

After the restart, I checked the memory usage in both the executor and the driver. In the Spark UI, the driver's memory usage appears unusual: it's showing 98.1 GB/19.1 GB.

  • My spark version is 2.4.0.

My Questions:

  • What does 98.1 GB / 19.1 GB in the Spark UI Storage tab for the driver indicate?
  • Could this excessive driver memory usage be the reason for my application's failure?
  • How can I debug or find the root cause of why my application fails once every 50 days?

Any insights or suggestions would be greatly appreciated!

转载请注明原文地址:http://anycun.com/QandA/1744861199a88652.html