AMD Instinct MI210 + vllm fail to run this model, any solutions please? Is there any other deepseek-r1-671b models that can run succesfully on AMD Instinct MI210 + vllm? Thanks!
Error message:
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 372, in init
assert self.quant_method is not None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
vLLM version? start up command? OS env?
vLLM version? start up command? OS env?
Hi!
@v2ray
Here is the details of vllm version and OS env: https://github.com/vllm-project/vllm/issues/16386
My start up commands are:
Start a docker container:
docker run -it --rm --ipc=host --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri/card5 --device=/dev/mem --group-add render --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
--network host
--name dsr1awq
--shm-size 896g
-v "/root/models:/models"
--privileged
-p 6381:6381
-p 1001:1001
-p 2001:2001
-e NCCL_IB_HCA=mlx5
-e NCCL_P2P_DISABLE=1
vllm-dsr1:v1 bashIn the started docker container, run model:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m vllm.entrypoints.openai.api_server --model /models/DeepSeek-R1-awq --tensor-parallel-size 8 --port 1001 --enforce_eager --distributed-executor-backend mp --pipeline-parallel-size 1 --max-model-len 1024 --dtype float16 --max-num-batched-tokens 1024 --trust-remote-code --enable-prefix-caching
The docker image vllm-dsr1:v1
is the alias of rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250311
.
Thanks for your help!
vllm0.7.3
Try build from source.
vllm0.7.3
Try build from source.
Hi! Even building from source will still encounter the same assertion error. See https://github.com/vllm-project/vllm/issues/15101
Welp, then I have no idea. I only tested it on CUDA hardware.