LLM inferencing on Paperspace cloud with 8x NVIDIA A100 GPUs running the Llama-3.1-70B-Instruct model

Type: Audited

Specs: STAC-AI™ LANG6

Stack under test:

  • STAC-AI™ Reference Implementation for vLLM OpenAI Server
  • vllm/vllm-openai:v0.5.5 Docker Container
  • Python 3.11.7, CUDA 12.2
  • Ubuntu Linux 20.04.3 LTS
  • Paperspace Cloud A100-80Gx8 VM
    • 8 x NVIDIA A100-SXM4-80GB GPUs
    • 2 x Intel® Xeon® Gold 6342 CPU @ 2.80 GHz
    • 720 GiB of virtualized memory

Please log in to see file attachments. If you are not registered, you may register for no charge.

Are you looking for our event dedicated to AI infrastructures? See AI STACNYC London