STAC Report: STAC-A3 on Google Cloud with Python & HPAT
2.4x the previous best throughput of a system using 5 nodes
27 December 2017
STAC recently tested two clusters using STAC-A3 benchmarks: the STAC-A3 Pack for Python (Rev A) using Python 3.6.2 and the High Performance Analytics Toolkit (HPAT) on Google Cloud 64 vCPU Intel Xeon (Skylake) instances with 14TB of Google Persistent SSD. One cluster (HPAT171028) contained 5 compute nodes, while the other (HPAT171029) contained 20.
The STAC Pack (benchmark implementation code) for this STAC-A3 project was authored by Intel Labs using the open source High Performance Analytics Toolkit (HPAT), a new data analytics framework that scales Python-based analytics code by compiling a subset of Python (Pandas/Numpy) to efficient parallel binaries with MPI, requiring a small amount of code annotation. Unlike many big data frameworks, HPAT does not use a master/executor paradigm or require the use of JVMs.
The reports are available here:
The configuration details, implementation code, and test-harness software are available to firms with appropriate subscriptions.
STAC-A3 simulates workloads common in the refinement and backtesting of trading strategies. As a rate-limiting step in a firm's response to changing market conditions, the performance of backtesting infrastructure has a top-line impact. Several trading firms drove the requirements for STAC-A3 in order to facilitate software and hardware comparisons. Like other STAC Benchmarks, STAC-A3 is agnostic to architecture.
The STAC Reports contain dozens of results. Of these, Google and Intel wished to highlight the following:
- Compared to a previously reported system using Scala on I/O-accelerated Spark on 5 Broadwell-based GCP nodes (SUT ID: LEVX170603), the 5-node cluster had:
- 2.4x the throughput at 50 instruments/50 simulations
- 33% higher throughput in STAC-A3.ß1.SWEEP.SPEED1
- Over 2x the storage efficiency
- Throughput of the 20-node cluster was 3.9x that of the 5-node cluster at 50 instruments/50 simulations (perfect linear scaling would be 4x)
This report also contains a proposed but unofficial analysis of the overall price-performance of the solution with respect to cloud costs.
STAC-A3 work is ongoing. If you'd like to be involved, please let us know at the STAC Backesting SIG site.
For information on premium subscriptions, please contact us.
About STAC News
Read the latest about research, events, and other important news from STAC.