TechAmalgam· Apache Spark
Games

Spark Applications

The driver, the executors, and the SparkSession.

Spark Applications consist of a driver process and a set of executor processes. The driver runs your main() function, sits on a node in the cluster, and is responsible for three things: maintaining information about the Spark Application; responding to a user's program or input; and analyzing, distributing, and scheduling work across the executors. The driver is the heart of a Spark Application and maintains all relevant information during its lifetime.

The executors carry out the work the driver assigns. Each executor is responsible for only two things: executing code assigned to it by the driver, and reporting the state of the computation on that executor back to the driver node.

A Spark driver coordinating work across four executors
The driver (left) coordinating work across four executors (right).

You control your Spark Application through the driver process called the SparkSession. The SparkSession instance is how Spark executes user-defined manipulations across the cluster.