STAC Report: STAC-A2 on KNL

Single-socket system beats larger systems in efficiency and almost alll systems on speed

20 June 2016

Today Intel released the Intel Xeon Phi 7250 Processor (code named Knights Landing or KNL). As attendees at the last round of STAC Summits will know, STAC tested a KNL-based system prior to launch using the STAC-A2 benchmark suite. The STAC Report from those tests is now available here.

This next generation of Xeon Phi directly accesses main memory and operates as a standalone processor, with no need for another processor. Intel uses a 14-nanometer process to pack up to 72 physical cores into the processor, along with a significant amount of high-bandwidth multi-channel DRAM (MCDRAM). The KNL processor we tested had 68 cores and 16GB of MCDRAM. Other key components of the stack under test (SUT) were Rev H of the Intel STAC-A2 Pack for Composer XE (the benchmark implementation code) and the CentOS 7.2 kernel updated with the Intel(R) Manycore Platform SW Package (MPSP) v1.2.2, all running in an Intel White Box with 96GB DRAM. Interestingly, this was the first single-socket system we ever tested with STAC-A2.

STAC-A2 is the technology benchmark standard based on financial market risk analysis. Designed by quants and technologists from some of the world's largest banks, STAC-A2 reports the performance, scaling, quality, and resource efficiency of any technology stack that is able to handle the workload (Monte Carlo estimation of Heston-based Greeks for a path-dependent, multi-asset option with early exercise).

Key results included:

  • This system took just 0.216 seconds in warm runs of the baseline Greeks benchmark (STAC-A2.β2.GREEKS.TIME.WARM). That is:
    • Within 2% of a dual-Haswell/dual-Knights Corner system (INTC151028)
    • 24% faster than a 4-socket Haswell system (INTC150811)
    • 30% faster than the fastest reported GPU-based system (NVDA141116)
    • 44% faster than the fastest reported non-Intel CPU-based system (IBM150305)
  • In terms of efficiency (STAC-A2.β2.GREEKS.ENERGY_EFFICIENCY and STAC-A2.β2.GREEKS.SPACE_EFFICIENCY), this system had:
    • 2x the energy efficiency of INTC151028 and 56% higher space efficiency
    • Over 2x the energy efficiency and over 5.7x the space efficiency of IBM150305.
  • Cold runs of the baseline Greeks benchmark (STAC-A2.β2.GREEKS.TIME.COLD) were 4.3x the speed of the previous best Xeon Phi results (INCT150130).

For details, please see the report at the link above. Premium subscribers also have access to the code used in this project and the micro-detailed configuration information for the solution. To learn about subscription options, please contact us.

About STAC News

Read the latest about research, events, and other important news from STAC.

Subscribe to notifications of research, events, and more.

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.

Enter your email above, then click "Sign Up" to join the STAC mail list and (optionally) register to access materials on the site. Click for terms.