STAC Reports: Intel's new Haswell server chip, with and without Xeon Phi (STAC-A2)
System with two Haswell processors and one Xeon Phi sets records for warm runs of the baseline STAC-A2 speed benchmark.
September 9, 2014
Today Intel launched its "Haswell EP" processors, the enterprise version of its latest chip architecture. Among other things, these processors support additional instruction sets like AVX2 to boost the parallel number-crunching ability of applications.
Intel recently asked STAC to use the STAC-A2 Benchmark suite to test a 2-socket white box server with Haswell EP processors, both with and without an Intel Xeon Phi co-processor card. The implementation code ("STAC Pack") that we tested with the Haswell-only configuration was the same source code used in a test with the previous-generation Ivy Bridge processors but recompiled to take advantage of the Haswell instruction sets. For the Haswell + Phi configuration, Intel took the code they used in an Ivy Bridge + Phi test and re-balanced which computations ran on the CPUs and which ran on the co-processors, allowing the more capable Haswell CPU to take on more of the load.
STAC-A2 is the user-developed benchmark standard based on financial market risk analysis. Designed by quants and technologists from some of the world's largest banks, STAC-A2 reports the performance, scaling, quality, and resource efficiency of any technology stack that is able to handle the workload (Monte Carlo estimation of Heston-based Greeks for a path-dependent, multi-asset option with early exercise).
Key results include:
- The system with two Haswell processors and one Xeon Phi was the fastest of any system published to date in warm runs of the end-to-end Greeks benchmark (STAC-A2.ß2.GREEKS.TIME.WARM). In the same benchmark, this system was 22% faster than a system with 2 CPUs plus 2 GPUs (SUT ID NVDA131118). It also had 46% higher asset capacity (STAC-A2.ß2.GREEKS.MAX_ASSETS) and 53% higher paths capacity (STAC-A2.ß2.GREEKS.MAX_PATHS).
- The Haswell system without a Phi co-processor was 30% faster in STAC-A2.ß2.GREEKS.TIME.WARM than an equivalent system with two Ivy Bridge processors running the same implementation source code. And this 2-CPU system was only 12% slower than the 2-CPU/2-GPU system mentioned above, while demonstrating a 46% higher asset capacity.
The reports are available to the public at the link above. As always, premium STAC subscribers can also access the source code and binaries used in these tests, as well as the micro-detailed configuration information for the systems tested.
About STAC News
Read the latest about research, events, and other important news from STAC.