vLLM compatibility issue with nanonets/Nanonets-OCR-s: Processor initialization conflict
Issue Description
The nanonets/Nanonets-OCR-s
model fails to load in vLLM (v0.9.1) due to a processor configuration conflict, while it works fine with transformers directly.
Error Details
Main Error:
TypeError: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'
Additional Issues:
UTF-8 decoding error in processor config files:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 8: invalid start byte
The processor configuration appears to have duplicate or conflicting
image_processor
arguments that cause vLLM to fail during initialization.
Environment
- vLLM version: 0.9.1
- transformers version: Latest
- Model:
nanonets/Nanonets-OCR-s
- Hardware: NVIDIA A100-PCIE-40GB
Commands Attempted
# All of these fail with the same error
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --limit-mm-per-prompt image=3
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --enforce-eager
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --tokenizer-mode slow
Expected Behavior
The model should load successfully in vLLM, similar to how base Qwen/Qwen2.5-VL-3B-Instruct
works.
Actual Behavior
vLLM fails during processor initialization with the multiple values for argument 'image_processor'
error.
Working Alternative
The model works perfectly with transformers directly:
from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)
Request
Could you please:
- Fix the processor configuration to be compatible with vLLM
- Ensure all config files use proper UTF-8 encoding
- Test compatibility with vLLM during model releases
This model is very useful for OCR tasks, and vLLM compatibility would be greatly appreciated by the community.
Full Error Log
Click to expand full error traceback
ERROR: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'
[Full traceback from your error logs...]
if anyone has this working on VLLM i would appreciate tips!
I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".
It worked for me with docker, but only for JPG not PDF directly.
export MODEL_PORT=8000
export MODEL_ID=nanonets/Nanonets-OCR-s
docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--gpus all \
--ipc=host \
-p "${MODEL_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--model ${MODEL_ID}
I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".
Thank you! It worked for me!