Comparison of IBM InfoSphere BigInsights Enterprise Edition with Apache Hadoop using SWIM

IBM asked STAC® to compare pure Apache Hadoop to BigInsights® with Adaptive MapReduce enabled, in the same hardware environment, using an off-the-shelf workload written to the Hadoop MapReduce API. This report documents and analyzes the test results and describes the system configuration and test methodology in detail.

For this project, we used the Statistical Workload Injector for MapReduce (SWIM) developed by the University of California at Berkeley. SWIM provides a large set of diverse MapReduce jobs based on production Hadoop traces obtained from Facebook, along with information to enable characterization of each job.

The hardware environment in the testbed consisted of 17 compute servers and 1 master server communicating over gigabit Ethernet. We compared Hadoop ver 1.1.2 to IBM BigInsights ver 2.1.0.1. Both systems used default configurations except where noted.

Stack under test:

  • IBM InfoSphere BigInsights Enterprise Edition 2.1.0.1 or Apache Hadoop 1.1.2
  • 18 x IBM System x3630 M3 servers
  • Red Hat Enterprise Linux 6.4
  • 12 x IBM 2TB SAS Hard Drives per server
  • 2 x 6-core Intel(R) Xeon(R) E5645 @ 2.4GHz ("Westmere")
  • Mellanox MT26448 ConnectX EN 10Gbps Adapters
  • Juniper Networks QFX35

Please log in to see file attachments. If you are not registered, you may register for no charge.

Big data has become a big topic at STAC. All it takes to confirm this is a quick glance at discussions and presentations at STAC Summits over the past few years (particularly NY and London).