--- license: other language: - en - zh base_model: - Qwen/Qwen2.5-VL-7B-Instruct pipeline_tag: image-text-to-text library_name: transformers tags: - multimodal - medical --- # BaichuanMed-OCR-7B ## Model Overview BaichuanMed-OCR-7B is a model fine-tuned from the Qwen2.5-VL-7B-Instruct with our constructed and curated medical report datasets consists of medical report images and related questions and answers (QAs). It has been specifically adapted to perform Optical Character Recognition (OCR) on medical report images and answer questions based on the extracted content. ### Capabilities: * **Robust OCR Capability**: Accurately recognizes complex textual content within medical report images. * **Structured Markdown Output**: Supports outputting extracted information in Markdown format. * **Accurate Comprehension and Generation**: Comprehends user queries and generates relevant, logically consistent answers derived from the OCR-extracted text. ## Evaluation To evaluate the effectiveness of BaichuanMed-OCR-7B on specific medical report data, we conducted a benchmark test on our private dataset (similar in composition to https://huggingface.co/datasets/mrlijun/SMR-R1), comparing its performance against relevant baseline models. The accuracy results are as follows: |Model|Acc%| |--------------------|--------| |Qwen2.5-VL-7B-Instruct|71.3%| |BaichuanMed-OCR-7B|83.5%| |Qwen2.5-VL-72B-Instruct|83.3%| |BaichuanMed-OCR-72B|**88.6%**| The benchmark results indicate that BaichuanMed-OCR-7B and BaichuanMed-OCR-72B achieves higher accuracy on this dataset, showing strong performance for tasks such as extracting key information and summarizing content from medical reports. ## Usage Usage is the same as Qwen2.5VL, here is an example to use the chat model with `transformers` and `qwen_vl_utils`: ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "baichuan-inc/BaichuanMed-OCR-7B", torch_dtype="auto", device_map="auto" ) # default processer processor = AutoProcessor.from_pretrained("baichuan-inc/BaichuanMed-OCR-7B") messages = [ { "role": "user", "content": [ { "type": "image", "image": "your image url or PATH here.", }, {"type": "text", "text": "提取图片中的文字和重要符号,有表格的就用表格输出。如果有水印,只需要识别一次,不要重复输出同样的水印内容。"}, ], } ] # Preparation for inference text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda") # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=8192) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` ## Citation If you find our work helpful, please cite it as follows: ``` @misc{lijun2025BaichuanMed-OCR-7B, author = {Lijun Liu, Tao Zhang, Tao Zhang, Chong Li, Mingrui Wang, Chenglin Zhu, Mingan Lin, Zenan Zhou, Weipeng Chen}, title = {BaichuanMed-OCR-7B: A powerful medical report OCR recognition model}, year = {2025} } ```