vLLM compatibility issue with nanonets/Nanonets-OCR-s: Processor initialization conflict

#18
by WpythonW - opened

Issue Description

The nanonets/Nanonets-OCR-s model fails to load in vLLM (v0.9.1) due to a processor configuration conflict, while it works fine with transformers directly.

Error Details

Main Error:

TypeError: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'

Additional Issues:

  1. UTF-8 decoding error in processor config files:

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 8: invalid start byte
    
  2. The processor configuration appears to have duplicate or conflicting image_processor arguments that cause vLLM to fail during initialization.

Environment

  • vLLM version: 0.9.1
  • transformers version: Latest
  • Model: nanonets/Nanonets-OCR-s
  • Hardware: NVIDIA A100-PCIE-40GB

Commands Attempted

# All of these fail with the same error
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --limit-mm-per-prompt image=3
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --enforce-eager
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --tokenizer-mode slow

Expected Behavior

The model should load successfully in vLLM, similar to how base Qwen/Qwen2.5-VL-3B-Instruct works.

Actual Behavior

vLLM fails during processor initialization with the multiple values for argument 'image_processor' error.

Working Alternative

The model works perfectly with transformers directly:

from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)

Request

Could you please:

  1. Fix the processor configuration to be compatible with vLLM
  2. Ensure all config files use proper UTF-8 encoding
  3. Test compatibility with vLLM during model releases

This model is very useful for OCR tasks, and vLLM compatibility would be greatly appreciated by the community.

Full Error Log

Click to expand full error traceback
ERROR: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'
[Full traceback from your error logs...]

if anyone has this working on VLLM i would appreciate tips!

I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".

This comment has been hidden (marked as Off-Topic)

It worked for me with docker, but only for JPG not PDF directly.

export MODEL_PORT=8000
export MODEL_ID=nanonets/Nanonets-OCR-s

docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--gpus all \
--ipc=host \
-p "${MODEL_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--model ${MODEL_ID}

I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".

Thank you! It worked for me!

Sign up or log in to comment