microsoft/Phi-4-multimodal-instruct · Suggested command fails to start with vLLM 0.8.1

Mar 24

•

I'm getting the following error on vLLM 0.8.1:

ValueError: lora_extra_vocab_size (0) must be one of (256, 512).

With the following command:

"${HOME}/vllm/bin/python" -m vllm.entrypoints.openai.api_server \
  --port 9099 \
  --model "${HOME}/models/textgen/Phi-4-multimodal-instruct" \
  --dtype auto \
  --gpu-memory-utilization 0.97 \
  --trust-remote-code \
  --max-model-len 43875 \
  --enable-lora \
  --lora-extra-vocab-size 0 \
  --max-lora-rank 320 \
  --limit-mm-per-prompt audio=3,image=3 \
  --max-loras 2 \
  --lora-modules speech="${HOME}/models/textgen/Phi-4-multimodal-instruct/speech-lora" vision="${HOME}/models/textgen/Phi-4-multimodal-instruct/vision-lora"

Removing the argument seems to resolve the error, but inference breaks down into gibberish after around 64 tokens. Is there a specific vLLM version we should be using?

Thanks again.

EDIT: I also experience this on vLLM 0.8.0 and nightly.

szwinfly

Mar 25

我也遇到一样的问题，不过这个我按提示调整到256就可以了，但是，我要说但是了，服务是起来了，但是使用却不正常，模型除了一开始正确，后面一直在吐乱码，而且停不下来，大家遇到了吗？

dinerburger

Mar 25

我也遇到一样的问题，不过这个我按提示调整到256就可以了，但是，我要说但是了，服务是起来了，但是使用却不正常，模型除了一开始正确，后面一直在吐乱码，而且停不下来，大家遇到了吗？

Yes I had the same problem. 😞 Results seem to become incoherent after 64 tokens or so.

jungleHuxj

Mar 27

我也遇到一样的问题，不过这个我按提示调整到256就可以了，但是，我要说但是了，服务是起来了，但是使用却不正常，模型除了一开始正确，后面一直在吐乱码，而且停不下来，大家遇到了吗？

我也是，一直乱码重复输出