STAC Research Note: Performance And Efficiency Comparison Between Self-Hosted LLMs And API Services

A comparison between self-hosted LLMs and API services

21 April 2025

We recently conducted a study comparing self-hosted LLMs and equivalent API models using the STAC-AI™ LANG6 (Inference-Only) Test Harness.

The STAC-AI™ benchmark provides rigorous, industry-standard testing for LLM inference infrastructure, helping firms assess performance, efficiency, and reliability in real-world conditions.

This research note examines the inference performance and efficiency of self-hosted models and compares them to 1) the same models available through API services (same-model comparison), as well as 2) a different but similar capability closed-source model (cross-model comparison).

We made 3 comparisons, 2 same-model comparisons and 1 cross-model comparison. The models in each comparison were given the same Interactive workload. Key areas of analysis include:

Differences in Latency (Reaction Time, Response Time, and Output Rate) for various arrival rates.
Differences in Inference efficiency for various arrival rates.

We also examined the performance variation of models available through API services. We ran our benchmarks on API models periodically for an extended period, to analyze Intraday and Intraweek performance trends for two LLM API services.

STAC subscribers can access the full report here. Subscribers also can run STAC-AI benchmarks in the privacy of their own labs to test their systems. To learn about subscription options, please contact us.

About STAC News

Read the latest about research, events, and other important news from STAC.

More News

Vault Report: STAC-A2 Risk Computation on 2x Intel 6980P Processors with RDIMMs

STAC Report: STAC-A2 Pack for oneAPI (Rev R) with 2 x Intel Xeon 6980P Processors, Micron MRDIMMs and Red Hat Enterprise Linux 9.5

Research Note: Comparing LLM Benchmarking Frameworks

STAC Report: Extending STAC-ML with Gradient Boosted Tree Models

STAC Research Report: LLM Model Serving Platform Comparison

You are here

STAC Research Note: Performance And Efficiency Comparison Between Self-Hosted LLMs And API Services

About STAC News

Subscribe to notifications of research, events, and more.

More News