How to run on a RTX 5090 / Blackwell ?

#27

by celsowm - opened 24 days ago

24 days ago

I’ve been struggling to get it running on an RTX 5090 with 32 GB of RAM. The official Docker images from Tencent don’t seem to be compatible with the Blackwell architecture. I even tried building vLLM from source via git clone, but no luck either.

Any hints?

asherszhang

Tencent org 24 days ago

Hi Celsown,

We've update the vLLM docker to cuda 12.4 + vLLM Official docker base image , could you check the compatible with Blackwell ?

But from your message, you have tried source code build, maybe this will not work too.

What's your error message, could you paste the full error log here ?

asherszhang

Tencent org 24 days ago

for a 32GB VRAM 5090, the VRAM too small to run a 80GB model even with int4 quantization.

aaron-newsome

13 days ago

•

edited 13 days ago

I'll give it a try on the RTX PRO 6000 Blackwell, 96GB of RAM, but the docker image shown on the model page seems to use CUDA 12.4, which won't work with Blackwell. If I remember correctly, blackwell chip requires cuda with sm_120 capability.

asherszhang

Tencent org 13 days ago

•

edited 13 days ago

I'll give it a try on the RTX PRO 6000 Blackwell, 96GB of RAM, but the docker image shown on the model page seems to use CUDA 12.4, which won't work with Blackwell. If I remember correctly, blackwell chip requires cuda with sm_120 capability.

For now, the latest vllm already merge the model support patch, you can use vllm openai docker:

vllm/vllm-openai:latest

This docker have Hunyuan-A13B-Instruct model support and cuda 12.8 support.
@aaron-newsome

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment