Fix: Resolve TypeError for video_processor during model loading.

### **Subject: Fix: Resolve `TypeError` for `video_processor` during model loading**

**Description:**

This pull request addresses a `TypeError` that occurs when loading the `dots-ocr` model with the latest versions of the `transformers` library. The error message, "Received a NoneType for argument 'video_processor', but a BaseVideoProcessor was expected," is triggered because the `DotsVLProcessor` class inherits from a processor that now expects a `video_processor` attribute.

**The Problem:**

The current implementation of `DotsVLProcessor` does not explicitly handle the `video_processor` argument in its constructor. As the base classes in the `transformers` library have evolved, this argument has become a required part of the processor's initialization, leading to a `NoneType` being passed and causing the `TypeError`.

**The Solution:**

This has been resolved by making a minor but critical addition to the `DotsVLProcessor` class. By adding `video_processor=None` to the `__init__` method, we explicitly initialize the video processor as `None`, satisfying the requirements of the parent class without altering the model's core OCR functionality.

The change is as follows:

```python
class DotsVLProcessor(Qwen2_5_VLProcessor):
attributes = ["image_processor", "tokenizer"]
def __init__(self, image_processor=None, tokenizer=None, video_processor=None, chat_template=None, **kwargs):
super().__init__(image_processor, tokenizer, chat_template=chat_template)
self.image_token = "<|imgpad|>" if not hasattr(tokenizer, "image_token") else tokenizer.image_token
self.image_token_id = 151665 if not hasattr(tokenizer, "image_token_id") else tokenizer.image_token_id
```

This ensures that the model remains compatible with recent library updates and can be loaded without error.

## The updated implementation with `transformers==4.57.1` is as follows:

HF Space: https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR3

<table>
<tr>
<td>
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/MicSVxaKNLCj-Ih2iNM5n.png" alt="Screenshot 1" width="400"/>
</td>
<td>
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/1dUvHL5V7RQ_jGihongWP.png" alt="Screenshot 2" width="400"/>
</td>
</tr>
</table>

Files changed (1) hide show

configuration_dots.py +4 -3

configuration_dots.py CHANGED Viewed

@@ -52,8 +52,8 @@ class DotsVisionConfig(PretrainedConfig):
 class DotsOCRConfig(Qwen2Config):
     model_type = "dots_ocr"
-    def __init__(self,
-        image_token_id = 151665,
         video_token_id = 151656,
         vision_config: Optional[dict] = None, *args, **kwargs):
         super().__init__(*args, **kwargs)
@@ -67,6 +67,7 @@ class DotsOCRConfig(Qwen2Config):
 class DotsVLProcessor(Qwen2_5_VLProcessor):
     def __init__(self, image_processor=None, tokenizer=None, chat_template=None, **kwargs):
         super().__init__(image_processor, tokenizer, chat_template=chat_template)
         self.image_token = "<|imgpad|>" if not hasattr(tokenizer, "image_token") else tokenizer.image_token
@@ -74,4 +75,4 @@ class DotsVLProcessor(Qwen2_5_VLProcessor):
 AutoProcessor.register("dots_ocr", DotsVLProcessor)
-CONFIG_MAPPING.register("dots_ocr", DotsOCRConfig)

 class DotsOCRConfig(Qwen2Config):
     model_type = "dots_ocr"
+    def __init__(self,
+        image_token_id = 151665,
         video_token_id = 151656,
         vision_config: Optional[dict] = None, *args, **kwargs):
         super().__init__(*args, **kwargs)
 class DotsVLProcessor(Qwen2_5_VLProcessor):
+    attributes = ["image_processor", "tokenizer"]
     def __init__(self, image_processor=None, tokenizer=None, chat_template=None, **kwargs):
         super().__init__(image_processor, tokenizer, chat_template=chat_template)
         self.image_token = "<|imgpad|>" if not hasattr(tokenizer, "image_token") else tokenizer.image_token
 AutoProcessor.register("dots_ocr", DotsVLProcessor)
+CONFIG_MAPPING.register("dots_ocr", DotsOCRConfig)