--- license: mit language: - en base_model: - OpenGVLab/InternVL3-8B pipeline_tag: image-text-to-text --- ## 🤔 Model We introduce Chiron-o1, a new medical MLLM based on a curriculum learning strategy and clinical chain-of-thought data, with robust visual question-answering and generalizable reasoning capabilities. Code will be available at https://github.com/manglu097/Chiron-o1 We provide an example of pure text reasoning using [transformers](https://huggingface.co/docs/transformers/index). For multimodal tasks, you can refer to the information [here](https://github.com/manglu097/Chiron-o1/blob/main/infer.py). ```python from transformers import AutoModel, AutoTokenizer import torch path = 'manglu3935/Chiron-o1-8B' model = AutoModel.from_pretrained( path, torch_dtype=torch.bfloat16, load_in_8bit=False, low_cpu_mem_usage=True, use_flash_attn=True, trust_remote_code=True, device_map="auto").eval() tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False) # pure text inference question = "Which of the following imaging findings is most consistent with a pure arterial malformation (PAM)?\nA) A vascular network connecting arteries and veins with early venous drainage \nB) A dilated, tortuous arterial loop without venous communication \nC) A focal saccular outpouching of a cerebral artery with surrounding edema \nD) A venous varix with adjacent arterial feeders\nLet's reason step-by-step to answer the above question." generation_config = dict(max_new_tokens=1024, do_sample=True) response = model.chat(tokenizer, None, question, generation_config) print(f'User: {question}\nAssistant: {response}') ``` ## 📖 Citation ``` @article{sun2025enhancingstepbystepverifiablemedical, title={Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs}, author={Haoran Sun and Yankai Jiang and Wenjie Lou and Yujie Zhang and Wenjie Li and Lilong Wang and Mianxin Liu and Lei Liu and Xiaosong Wang}, journal={arXiv preprint arXiv:2506.16962}, year={2025} } ```