STAC Report: Comparison of MapReduce Implementations
Audited benchmark shows Apache Hadoop with IBM Platform Symphony performed jobs 7.3 times faster, on average, than pure Apache Hadoop.
7 November 2012
STAC has just released a STAC Report analyzing the performance benefit of IBM Platform Symphony for Hadoop MapReduce workloads. The report is available here for no charge.
MapReduce is a framework for data-aware distributed computing that is enjoying substantial market uptake. As users move more MapReduce projects from development to operations, they are paying increasing attention to production-related issues. One of these is performance: i.e., how quickly MapReduce jobs can be completed with a given amount of computing resources.
IBM® has created a proprietary implementation of the open-source Hadoop MapReduce run-time that leverages the IBM Platform™ Symphony distributed computing middleware while maintaining application-level compatibility with Apache Hadoop. IBM claims that this implementation of MapReduce run-time components can accelerate various open-source and commercial MapReduce implementations. IBM asked STAC to compare Apache Hadoop to Apache Hadoop accelerated by IBM Platform Symphony Advanced Edition in the same hardware environment, using an off-the-shelf workload written to the Hadoop MapReduce API.
On average, Symphony performed jobs 7.3 times faster than Hadoop. This reduced the total processing time for all jobs by a factor of 6. The STAC Report details the test methodology, system configuration, and test results.
We'll be talking about this project at the STAC Summit in London, as well as proposals for other "big data" benchmark specifications, which will be one activity of the STAC Big Data SIG in 2013. STAC members interested in this conversation are encouraged to let us know here.
About STAC News
Read the latest about research, events, and other important news from STAC.