LLM inferencing on Paperspace cloud with 8x NVIDIA H100 GPUs running the Llama-3.1-70B-Instruct model

Type: Audited

Specs: STAC-AI™ LANG6

Stack under test:

  • STAC-AI™ Reference Implementation for vLLM OpenAI Server
  • vllm/vllm-openai:v0.5.5 Docker Container
  • Python 3.11.7, CUDA 12.2
  • Ubuntu Linux 20.04.3 LTS
  • Paperspace Cloud H100x8 VM
    • 8 x NVIDIA H100-80GB-HBM3 GPUs
    • 2 x Intel® Xeon® Platinum 8458P CPU @ 2.70 GHz
    • 1.6TB of virtualized memory

Please log in to see file attachments. If you are not registered, you may register for no charge.

Are you looking for our event dedicated to AI infrastructures? See AI STACNYC London