cognitivecomputations/DeepSeek-R1-AWQ · AMD Instinct MI210 + vllm fail to run this model, any solutions please? Is there any other deepseek-r1-671b models that can run succesfully on AMD Instinct MI210 + vllm? Thanks!

5 days ago

Error message：

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 372, in init
assert self.quant_method is not None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

v2ray

Cognitive Computations org 5 days ago

vLLM version? start up command? OS env?

luciagan

5 days ago

vLLM version? start up command? OS env?

Hi! @v2ray
Here is the details of vllm version and OS env: https://github.com/vllm-project/vllm/issues/16386

My start up commands are:

Start a docker container:
docker run -it --rm --ipc=host --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri/card5 --device=/dev/mem --group-add render --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
--network host
--name dsr1awq
--shm-size 896g
-v "/root/models:/models"
--privileged
-p 6381:6381
-p 1001:1001
-p 2001:2001
-e NCCL_IB_HCA=mlx5
-e NCCL_P2P_DISABLE=1
vllm-dsr1:v1 bash
In the started docker container, run model:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m vllm.entrypoints.openai.api_server --model /models/DeepSeek-R1-awq --tensor-parallel-size 8 --port 1001 --enforce_eager --distributed-executor-backend mp --pipeline-parallel-size 1 --max-model-len 1024 --dtype float16 --max-num-batched-tokens 1024 --trust-remote-code --enable-prefix-caching

The docker image vllm-dsr1:v1 is the alias of rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250311.

Thanks for your help!

v2ray

Cognitive Computations org 4 days ago

vllm0.7.3

Try build from source.

luciagan

4 days ago

vllm0.7.3

Try build from source.

Hi! Even building from source will still encounter the same assertion error. See https://github.com/vllm-project/vllm/issues/15101

v2ray

Cognitive Computations org 4 days ago

Welp, then I have no idea. I only tested it on CUDA hardware.