Suggested command fails to start with vLLM 0.8.1
#51
by
dinerburger
- opened
I'm getting the following error on vLLM 0.8.1:
ValueError: lora_extra_vocab_size (0) must be one of (256, 512).
With the following command:
"${HOME}/vllm/bin/python" -m vllm.entrypoints.openai.api_server \
--port 9099 \
--model "${HOME}/models/textgen/Phi-4-multimodal-instruct" \
--dtype auto \
--gpu-memory-utilization 0.97 \
--trust-remote-code \
--max-model-len 43875 \
--enable-lora \
--lora-extra-vocab-size 0 \
--max-lora-rank 320 \
--limit-mm-per-prompt audio=3,image=3 \
--max-loras 2 \
--lora-modules speech="${HOME}/models/textgen/Phi-4-multimodal-instruct/speech-lora" vision="${HOME}/models/textgen/Phi-4-multimodal-instruct/vision-lora"
Removing the argument seems to resolve the error, but inference breaks down into gibberish after around 64 tokens. Is there a specific vLLM version we should be using?
Thanks again.
EDIT: I also experience this on vLLM 0.8.0 and nightly.
我也遇到一样的问题,不过这个我按提示调整到256就可以了,但是,我要说但是了,服务是起来了,但是使用却不正常,模型除了一开始正确,后面一直在吐乱码,而且停不下来,大家遇到了吗?
我也遇到一样的问题,不过这个我按提示调整到256就可以了,但是,我要说但是了,服务是起来了,但是使用却不正常,模型除了一开始正确,后面一直在吐乱码,而且停不下来,大家遇到了吗?
Yes I had the same problem. 😞 Results seem to become incoherent after 64 tokens or so.
我也遇到一样的问题,不过这个我按提示调整到256就可以了,但是,我要说但是了,服务是起来了,但是使用却不正常,模型除了一开始正确,后面一直在吐乱码,而且停不下来,大家遇到了吗?
我也是,一直乱码重复输出