STAC Research Note: Performance And Efficiency Comparison Between Self-Hosted LLMs And API Services

Type: Research Note

Specs: STAC-AI™ LANG6

This study evaluates two methods of utilizing LLM, self-hosting or through an API provider, using the STAC-AI™ LANG6 (Inference-Only) Test Harness. The STAC-AI™ benchmark provides industry-standard testing to assess the performance, efficiency, and reliability of LLM inference infrastructure in real-world conditions. We analyze the latency performance and efficiency of pairs of self-hosted models and same or equivalent API models. We also analyzed potential latency performance variation of API services. These insights offer valuable guidance for firms optimizing their LLM infrastructure.

Please log in to see file attachments. If you are not registered, you may register for no charge.

The STAC-AI Working Group focuses on benchmarking artificial intelligence (AI) technologies in finance. This includes deep learning, large language models (LLMs), and other AI-driven approaches that help firms unlock new efficiencies and insights.