File size: 5,562 Bytes
0718a4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
license: apache-2.0
base_model:
- microsoft/Phi-3.5-vision-instruct
tags:
- medical-vision-language-model
- chest-xray
- preference-optimization
metrics:
- accuracy
- f1
language:
- en
pipeline_tag: visual-question-answering
library_name: transformers
---

# CheX-Phi3.5V — Preference-Optimised Vision-Language Model for Chest X-ray Interpretation

**CheX-Phi3.5V** is a vision–language model (VLM) that answers clinical questions about chest radiographs and generates structured findings reports.  
Built on **Phi-3.5 Vision-Instruct (7 B)**, it introduces **Direct Preference Optimization (DPO)** and **contrastive rejection** to achieve fine-grained medical reasoning while suppressing hallucinations.

---

## Key Features

| Aspect          | Description                                                                                    |
|-----------------|------------------------------------------------------------------------------------------------|
| **Modality**    | Single-image chest radiography (frontal & lateral)                                             |
| **Tasks**       | Visual Question Answering (VQA) & Findings generation                                          |
| **Backbone**    | Phi-3.5 Vision 7 B with an enhanced visual projection layer                                    |
| **Optimisation**| 2-stage SFT → DPO + contrastive rejection learning                                             |
| **License**     | Apache 2.0 — free for research **and** commercial use                                          |

---

## Quick Start

```python
from transformers import AutoModelForVision2Seq, AutoProcessor

model_id  = "remove4anonymous/CheX-Phi-3.5-vision-instruct-DPO"
model     = AutoModelForVision2Seq.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

inputs = processor(
    images="example_frontal.jpg",
    text="Question: What abnormalities are present?\nAnswer:",
    return_tensors="pt"
).to(model.device)

generated = model.generate(**inputs, max_new_tokens=128)
print(processor.decode(generated[0], skip_special_tokens=True))
````

> **Dependencies**  `pip install transformers>=4.41.0 timm accelerate`
> For batch inference or a Streamlit demo, see the scripts in the [GitHub repo](https://github.com/remove4anonymous/CheX-Phi35V).

---

## Available Checkpoints

| HF Repo                                                                                                       | Stage         | Recommended Use           |
| ------------------------------------------------------------------------------------------------------------- | ------------- | ------------------------- |
| [`CheX-Phi3.5-vision-instruct-DPO`](https://huggingface.co/remove4anonymous/CheX-Phi-3.5-vision-instruct-DPO) | **DPO**       | Production / evaluation   |
| [`CheX-Phi3.5-vision-instruct-SFT`](https://huggingface.co/remove4anonymous/CheX-Phi-3.5-vision-instruct-SFT) | SFT (phase 2) | Further preference tuning |
| [`Phi-3.5-vision-instruct`](https://huggingface.co/remove4anonymous/Phi-3.5-vision-instruct)                  | Base          | Custom fine-tuning        |

---

## Training Data & Procedure

| Stage           | Data & Size                         | Objective                                 |
| --------------- | ----------------------------------- | ----------------------------------------- |
| **SFT**         | 120 k QA triplets (`MIMIC-EXT VQA`) | Supervised instruction tuning             |
| **DPO**         | 30 k preference-paired QA           | Direct Preference Optimization            |
| **Contrastive** | 250 k unlabelled MIMIC-CXR images   | Rejection learning to curb hallucinations |

*Hardware* : 8 × A100 80 GB • FP16 • DeepSpeed ZeRO-3
*Total steps* ≈ 2.2 M.

---

## Evaluation

| Dataset                 | Split     | Metric    | Score     |
| ----------------------- | --------- | --------- | --------- |
| MIMIC-CXR VQA           | test      | Accuracy  | **0.894** |
| OpenI CXR-QA            | test      | BLEU-4    | **79.4**  |
| Radiologist Turing Test | 200 cases | Pass rate | 61 %      |

Evaluation scripts are provided in `stage3_evaluate_mimic_ext_vqa.sh`.

---

## Ethical & Safety Considerations

* **Clinical usage** — Outputs are *assistive* only; a certified radiologist must confirm findings.
* **Bias** — Training data skewed towards North-American adult populations; paediatric or non-western cohorts may underperform.
* **Privacy** — MIMIC-CXR is fully de-identified; the model does not memorise PHI.
* **Hallucinations** — Contrastive rejection reduces but does not eliminate false positives; use confidence thresholds.

---

## Known Limitations

1. No generalisation to CT, MRI, or ultrasound modalities.
2. Sensitive to extreme noise & portable AP projections.
3. Knowledge cutoff = Mar 2023; newly described conditions may be missed.

---

## Resources

* **Code & training scripts** — [https://github.com/remove4anonymous/CheX-Phi35V](https://github.com/remove4anonymous/CheX-Phi35V)
* **Data utilities**`tools/generate_visual_prompt.py`
* **Demo notebook**`demo.py`

---

## Citation

```bibtex
@misc{liu2025chexphi35v,
  title        = {CheX-Phi3.5V: Preference-Optimised Vision-Language Model for Chest X-ray Interpretation},
  author       = {Liu, Xiao and Others},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/remove4anonymous/CheX-Phi-3.5-vision-instruct-DPO}}
}
```

> If you use **CheX-Phi3.5V**, please cite us and consider sharing your downstream results!