nanonets/Nanonets-OCR-s · vLLM compatibility issue with nanonets/Nanonets-OCR-s: Processor initialization conflict

Jun 30

Issue Description

The nanonets/Nanonets-OCR-s model fails to load in vLLM (v0.9.1) due to a processor configuration conflict, while it works fine with transformers directly.

Error Details

Main Error:

TypeError: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'

Additional Issues:

UTF-8 decoding error in processor config files:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 8: invalid start byte

The processor configuration appears to have duplicate or conflicting image_processor arguments that cause vLLM to fail during initialization.

Environment

vLLM version: 0.9.1
transformers version: Latest
Model: nanonets/Nanonets-OCR-s
Hardware: NVIDIA A100-PCIE-40GB

Commands Attempted

# All of these fail with the same error
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --limit-mm-per-prompt image=3
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --enforce-eager
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --tokenizer-mode slow

Expected Behavior

The model should load successfully in vLLM, similar to how base Qwen/Qwen2.5-VL-3B-Instruct works.

Actual Behavior

vLLM fails during processor initialization with the multiple values for argument 'image_processor' error.

Working Alternative

The model works perfectly with transformers directly:

from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)

Request

Could you please:

Fix the processor configuration to be compatible with vLLM
Ensure all config files use proper UTF-8 encoding
Test compatibility with vLLM during model releases

This model is very useful for OCR tasks, and vLLM compatibility would be greatly appreciated by the community.

Full Error Log

Click to expand full error traceback

ERROR: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'
[Full traceback from your error logs...]

rawwerks

Jun 30

if anyone has this working on VLLM i would appreciate tips!

amazingvince

Jul 1

I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".

aziontech

Jul 1

This comment has been hidden (marked as Off-Topic)

alecauduro

Jul 1

•

edited Jul 1

It worked for me with docker, but only for JPG not PDF directly.

export MODEL_PORT=8000
export MODEL_ID=nanonets/Nanonets-OCR-s

docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--gpus all \
--ipc=host \
-p "${MODEL_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--model ${MODEL_ID}

WpythonW

Jul 2

I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".

Thank you! It worked for me!