STAC Report: STAC-ML on Myrtle.ai VOLLO

First STAC-ML project using FPGAs as accelerators

14 December 2022

STAC recently performed the first STAC-ML™ Markets (Inference) benchmark tests on a solution using FPGAs as accelerators. Myrtle.ai submitted a solution based on their VOLLO Accelerator running on BittWare cards with Intel® Agilex™ FPGAs. The STAC Report is now available here.

STAC-ML Markets (Inference) is the technology benchmark standard for solutions that can be used to run inference on realtime market data. Designed by quants and technologists from some of the world's leading financial firms, the benchmarks test the latency, throughput, realized precision, energy efficiency, and space efficiency of a technology stack across three model sizes and different numbers of model instances (NMI). In this project, we ran the fixed-window benchmark suite (code named Sumaco).

The stack consisted of the STAC-ML™ Pack for Myrtle.ai VOLLO™ (Rev A) using the Myrtle.ai VOLLO SDK v0.1.0 to control the VOLLO Accelerator v0.1.0 application loaded onto 4 x BittWare IA-840f (Intel® Agilex™ AGF027 FPGA) Cards in a BittWare TeraBox™ 1402B Server.

Myrtle.ai wished to highlight several results from this report:

99th percentile (99p) latencies across 1, 2, 3 & 4 NMI were:

24.0-24.1 μsec for LSTM_A (the smallest model)
(STAC-ML.Markets.Inf.S.LSTM_A.[1, 2, 3, 4].LAT.v1)
64.8 μsec for LSTM_B
(STAC-ML.Markets.Inf.S.LSTM_B.[1, 2, 3, 4].LAT.v1)
1.35 ms for LSTM_C (the largest model)
(STAC-ML.Markets.Inf.S.LSTM_C.[1, 2, 3, 4].LAT.v1)

For LSTM_A with 48 NMI:

Total throughput exceeded 650K inf/sec
(STAC-ML.Markets.Inf.S.LSTM_A.48.TPUT.v1)
Space efficiency exceeded 646K inf/sec/cubic foot
(STAC-ML.Markets.Inf.S.LSTM_A.48.SPACE_EFF.v1)
Energy efficiency exceeded 1.18M inf/sec/kW
(STAC-ML.Markets.Inf.S.LSTM_A.48.ENERG_EFF.v1)
The 99p latency (73.9 μsec) was 3.1x the 99p latency of 1 NMI
(STAC-ML.Markets.Inf.S.LSTM_A.[1, 48].LAT.v1)

For LSTM_B with 16 NMI tested:

The 99p latency (147 μsec) was 2.3x the 99p latency of 1 NMI
(STAC-ML.Markets.Inf.S.LSTM_B.[1, 16].LAT.v1)

Across all Models and NMI tested:

The widest percentage spread from median to 99p latency was 7% (26.5 μsec to 28.4 μsec for STAC-ML.Markets.Inf.S.LSTM_A.12.LAT.v1)

Premium subscribers have access to extensive visualizations of all test results, the micro-detailed configuration information for the solutions tested, the code used in this project, and the ability to run these same benchmarks in the privacy of their own labs. To learn about subscription options, please contact us.

About STAC News

Read the latest about research, events, and other important news from STAC.

More News

Vault Report: STAC-A2 Risk Computation on 2x Intel 6980P Processors with RDIMMs

STAC Report: STAC-A2 Pack for oneAPI (Rev R) with 2 x Intel Xeon 6980P Processors, Micron MRDIMMs and Red Hat Enterprise Linux 9.5

Research Note: Comparing LLM Benchmarking Frameworks

STAC Research Note: Performance And Efficiency Comparison Between Self-Hosted LLMs And API Services

STAC Report: Extending STAC-ML with Gradient Boosted Tree Models

You are here

STAC Report: STAC-ML on Myrtle.ai VOLLO

About STAC News

Subscribe to notifications of research, events, and more.

More News