- SUT ID: STAC250211
- STAC-AI
LLM Model Serving Platform Comparison
Type: Vault Report
Specs: STAC-AI™ LANG6
This study evaluates two model-serving platforms, vLLM and Hugging Face’s text-generation-inference (TGI), for large language model (LLM) inference using the STAC-AI™ LANG6 (Inference-Only) Test Harness. The STAC-AI™ benchmark provides industry-standard testing to assess the performance, efficiency, and reliability of LLM inference infrastructure in real-world conditions. We analyze Inference Rate, Throughput, and Fidelity across four workloads, incorporating short- and long-context datasets with both 8B and 70B parameter models. Key findings highlight differences in Inference Rate, the impact of platform versions on serving efficiency, variations in response consistency, and patterns of non-determinism in generated outputs. These insights offer valuable guidance for firms optimizing their LLM infrastructure.