---
license: other
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- multimodal
- medical
---


# BaichuanMed-OCR-7B

## Model Overview

BaichuanMed-OCR-7B is a model fine-tuned from the Qwen2.5-VL-7B-Instruct with our constructed and curated medical report datasets consists of medical report images and related questions and answers (QAs). It has been specifically adapted to perform Optical Character Recognition (OCR) on medical report images and answer questions based on the extracted content.

### Capabilities:

* **Robust OCR Capability**: Accurately recognizes complex textual content within medical report images.
* **Structured Markdown Output**: Supports outputting extracted information in Markdown format.
* **Accurate Comprehension and Generation**: Comprehends user queries and generates relevant, logically consistent answers derived from the OCR-extracted text.

## Evaluation

To evaluate the effectiveness of BaichuanMed-OCR-7B on specific medical report data, we conducted a benchmark test on our private dataset (similar in composition to https://huggingface.co/datasets/mrlijun/SMR-R1), comparing its performance against relevant baseline models. The accuracy results are as follows:

|Model|Acc%|
|--------------------|--------|
|Qwen2.5-VL-7B-Instruct|71.3%|
|BaichuanMed-OCR-7B|83.5%|
|Qwen2.5-VL-72B-Instruct|83.3%|
|BaichuanMed-OCR-72B|**88.6%**|

The benchmark results indicate that BaichuanMed-OCR-7B and BaichuanMed-OCR-72B achieves higher accuracy on this dataset, showing strong performance for tasks such as extracting key information and summarizing content from medical reports.

## Usage

Usage is the same as Qwen2.5VL, here is an example to use the chat model with `transformers` and `qwen_vl_utils`: 

```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "baichuan-inc/BaichuanMed-OCR-7B", torch_dtype="auto", device_map="auto"
)

# default processer
processor = AutoProcessor.from_pretrained("baichuan-inc/BaichuanMed-OCR-7B")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "your image url or PATH here.",
            },
            {"type": "text", "text": "提取图片中的文字和重要符号，有表格的就用表格输出。如果有水印，只需要识别一次，不要重复输出同样的水印内容。"},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
```

## Citation

If you find our work helpful, please cite it as follows:

```
@misc{lijun2025BaichuanMed-OCR-7B,
  author       = {Lijun Liu, Tao Zhang, Tao Zhang, Chong Li, Mingrui Wang, Chenglin Zhu, Mingan Lin, Zenan Zhou, Weipeng Chen},
  title        = {BaichuanMed-OCR-7B: A powerful medical report OCR recognition model},
  year         = {2025}
}
```