STAC Report: Spark Resource Managers, Phase 2

Spark throughput was highest when managed by IBM Spectrum Conductor with Spark.

19 May 2017

Last year we used the open source Spark Multi-User Benchmark (SMB) developed by IBM to study the performance of Apache Spark on three different resource managers: Apache YARN, Apache Mesos, and IBM Spectrum Conductor with Spark. That version of the benchmark (SMB-1) had just one job type (a small sort job) in a job-submission pattern designed to simulate an interactive workload.

This year we tested the same three products using a new version of the benchmark (SMB-2) that has more job types and submission patterns. SMB-2 includes both batch and interactive jobs (short queries, long queries, machine learning), which it studies not only in isolation but also in mixed workloads, including a multi-tenant scenario in which job types have differing business priorities.

The STAC Report from this project is now available. Results highlights:

  • Spark throughput was highest when managed by IBM Spectrum Conductor with Spark, for interactive, batch, and mixed workloads:
    • 25-88% higher than Apache Mesos
    • 30-224% higher than Apache YARN
  • Spark performance was steadiest when managed by IBM Spectrum Conductor with Spark, as measured by relative standard deviation (RSD) of job durations:
    • In mixed interactive and batch workloads, weighted average RSD was just 28-34% for IBM Spectrum Conductor with Spark, compared to 60-84% for Apache YARN and 73-92% for Apache Mesos

The STAC Report has detailed results as well as a thorough discussion of the test methodology and its constraints. I'll also discuss this project at the upcoming STAC Summits in New York, Chicago, and London, as well as the upcoming Spark Summit in San Francisco.

* A detailed STAC Configuration Disclosure is available via the same link to members of the STAC Benchmark Council with premium subscriptions. For information on premium subscriptions, please contact us.