LLM inferencing on Paperspace cloud with 8x NVIDIA A100 GPUs running the Llama-3.1-8B-Instruct model

Type: Audited

Specs: STAC-AI™ LANG6

Stack under test:

  • STAC-AI™ Reference Implementation for vLLM OpenAI Server
  • vllm/vllm-openai:v0.5.5 Docker Container
  • Python 3.11.7, CUDA 12.2
  • Ubuntu Linux 22.04.3 LTS
  • Paperspace Cloud A100-80Gx8 VM
    • 8 x NVIDIA A100-SXM4-80GB GPUs
    • 2 x Intel® Xeon® Gold 6342 CPU @ 2.80 GHz
    • 720 GiB of virtualized memory

Please log in to see file attachments. If you are not registered, you may register for no charge.

The STAC-AI Working Group focuses on benchmarking artificial intelligence (AI) technologies in finance. This includes deep learning, large language models (LLMs), and other AI-driven approaches that help firms unlock new efficiencies and insights.