---
license: mit
language:
- en
base_model:
- OpenGVLab/InternVL3-8B
pipeline_tag: image-text-to-text
---
## 🤔 Model
We introduce Chiron-o1,  a new medical MLLM based on a curriculum learning strategy and clinical chain-of-thought data, with robust visual question-answering and generalizable reasoning capabilities. 
Code will be available at https://github.com/manglu097/Chiron-o1

We provide an example of pure text reasoning using [transformers](https://huggingface.co/docs/transformers/index). For multimodal tasks, you can refer to the information [here](https://github.com/manglu097/Chiron-o1/blob/main/infer.py).

```python
from transformers import AutoModel, AutoTokenizer
import torch

path = 'manglu3935/Chiron-o1-8B'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    load_in_8bit=False,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True,
    device_map="auto").eval()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

# pure text inference
question = "Which of the following imaging findings is most consistent with a pure arterial malformation (PAM)?\nA) A vascular network connecting arteries and veins with early venous drainage  \nB) A dilated, tortuous arterial loop without venous communication  \nC) A focal saccular outpouching of a cerebral artery with surrounding edema  \nD) A venous varix with adjacent arterial feeders\nLet's reason step-by-step to answer the above question."
generation_config = dict(max_new_tokens=1024, do_sample=True)
response = model.chat(tokenizer, None, question, generation_config)
print(f'User: {question}\nAssistant: {response}')
```
## 📖 Citation

```
@article{sun2025enhancingstepbystepverifiablemedical,
  title={Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs},
  author={Haoran Sun and Yankai Jiang and Wenjie Lou and Yujie Zhang and Wenjie Li and Lilong Wang and Mianxin Liu and Lei Liu and Xiaosong Wang},
  journal={arXiv preprint arXiv:2506.16962},
  year={2025}
}
```