Suggested command fails to start with vLLM 0.8.1

#51
by dinerburger - opened

I'm getting the following error on vLLM 0.8.1:

ValueError: lora_extra_vocab_size (0) must be one of (256, 512).

With the following command:

"${HOME}/vllm/bin/python" -m vllm.entrypoints.openai.api_server \
  --port 9099 \
  --model "${HOME}/models/textgen/Phi-4-multimodal-instruct" \
  --dtype auto \
  --gpu-memory-utilization 0.97 \
  --trust-remote-code \
  --max-model-len 43875 \
  --enable-lora \
  --lora-extra-vocab-size 0 \
  --max-lora-rank 320 \
  --limit-mm-per-prompt audio=3,image=3 \
  --max-loras 2 \
  --lora-modules speech="${HOME}/models/textgen/Phi-4-multimodal-instruct/speech-lora" vision="${HOME}/models/textgen/Phi-4-multimodal-instruct/vision-lora"

Removing the argument seems to resolve the error, but inference breaks down into gibberish after around 64 tokens. Is there a specific vLLM version we should be using?

Thanks again.

EDIT: I also experience this on vLLM 0.8.0 and nightly.

我也遇到一样的问题,不过这个我按提示调整到256就可以了,但是,我要说但是了,服务是起来了,但是使用却不正常,模型除了一开始正确,后面一直在吐乱码,而且停不下来,大家遇到了吗?

我也遇到一样的问题,不过这个我按提示调整到256就可以了,但是,我要说但是了,服务是起来了,但是使用却不正常,模型除了一开始正确,后面一直在吐乱码,而且停不下来,大家遇到了吗?

Yes I had the same problem. 😞 Results seem to become incoherent after 64 tokens or so.

我也遇到一样的问题,不过这个我按提示调整到256就可以了,但是,我要说但是了,服务是起来了,但是使用却不正常,模型除了一开始正确,后面一直在吐乱码,而且停不下来,大家遇到了吗?

我也是,一直乱码重复输出

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment