STAC Research Report: LLM Model Serving Platform Comparison

A comparison of model-serving platforms

13 February 2025

We recently conducted a study comparing two model-serving platforms - vLLM and Hugging Face’s text-generation-inference (TGI) - for LLM inference using the STAC-AI™ LANG6 (Inference-Only) Test Harness.

The STAC-AI™ benchmark provides rigorous, industry-standard testing for LLM inference infrastructure, helping firms assess performance, efficiency, and reliability in real-world conditions.

This research note examines Inference Rate, Throughput, and Fidelity, offering valuable insights for firms optimizing their LLM infrastructure.

We tested four workloads, combining short- and long-context datasets with both 8B and 70B parameter models. Key areas of analysis include:

Differences in Inference Rate across platforms.
The impact of platform versions on serving efficiency.
Variations in response consistency.
Patterns of non-determinism and how they manifest in generated outputs.

Premium subscribers have access to this research note, which includes visualizations of all test results, and configuration information for the solutions tested. Subscribers also can run STAC-AI benchmarks in the privacy of their own labs to test their systems. To learn about subscription options, please contact us.

About STAC News

Read the latest about research, events, and other important news from STAC.

More News

Vault Report: STAC-A2 Risk Computation on 2x Intel 6980P Processors with RDIMMs

STAC Report: STAC-A2 Pack for oneAPI (Rev R) with 2 x Intel Xeon 6980P Processors, Micron MRDIMMs and Red Hat Enterprise Linux 9.5

Research Note: Comparing LLM Benchmarking Frameworks

STAC Research Note: Performance And Efficiency Comparison Between Self-Hosted LLMs And API Services

STAC Report: Extending STAC-ML with Gradient Boosted Tree Models

You are here

STAC Research Report: LLM Model Serving Platform Comparison

About STAC News

Subscribe to notifications of research, events, and more.

More News