Basic Architecture
Why Spark needs a cluster, and who manages it.
A single machine works well for watching movies or using a spreadsheet, but it lacks the power and resources to perform computations on huge amounts of data — or you simply don't have time to wait. A cluster, or group, of computers pools the resources of many machines together, letting us use the cumulative resources as if they were a single computer.
A group of machines alone is not powerful; you need a framework to coordinate work across them. Spark does exactly that: managing and coordinating the execution of tasks on data across a cluster.
The cluster of machines that Spark uses is managed by a cluster manager like Spark's standalone cluster manager, YARN, or Mesos. We submit Spark Applications to these cluster managers, which grant resources to our application so we can complete our work.