File size: 4,524 Bytes
893a62c
965e5ab
 
893a62c
 
 
 
 
a62206f
 
 
 
 
 
965e5ab
a62206f
 
893a62c
 
 
 
 
 
d5f2075
 
 
a62206f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6fd28d4
 
 
 
 
 
 
 
 
 
 
a62206f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
893a62c
965e5ab
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
base_model:
- google/gemma-3n-E2B-it
tags:
- text-generation-inference
- transformers
- unsloth
- gemma3n
- medical
- vision-language
- gemma
- ecg
- cardiology
- healthcare
license: cc-by-4.0
datasets:
  - yasserrmd/pulse-ecg-instruct-subset
language:
- en
---



# GemmaECG-Vision

<img src="GemmaECG Vision_ Future of Health.png" width="800" />

`GemmaECG-Vision` is a fine-tuned vision-language model built on `google/gemma-3n-e2b`, designed for ECG image interpretation tasks. The model accepts a medical ECG image along with a clinical instruction prompt and generates a structured analysis suitable for triage or documentation use cases.

This model was developed using **Unsloth** for efficient fine-tuning and supports **image + text** inputs with medical task-specific prompt formatting. It is designed to run in **offline or edge environments**, enabling healthcare triage in resource-constrained settings.

## Model Objective

To assist healthcare professionals and emergency responders by providing AI-generated ECG analysis directly from medical images, without requiring internet access or cloud resources.

## Usage

This model expects:
- An ECG image (`PIL.Image`)
- A textual instruction such as:

```

You are a clinical assistant specialized in ECG interpretation. Given an ECG image, generate a concise, structured, and medically accurate report.

Use this exact format:

Rhythm:
PR Interval:
QRS Duration:
Axis:
Bundle Branch Blocks:
Atrial Abnormalities:
Ventricular Hypertrophy:
Q Wave or QS Complexes:
T Wave Abnormalities:
ST Segment Changes:
Final Impression:

````

### Inference Example (Python)

```python
from transformers import AutoProcessor, Gemma3nForConditionalGeneration
from PIL import Image
import torch

model_id = "yasserrmd/GemmaECG-Vision"
model = Gemma3nForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval().to("cuda")
processor = AutoProcessor.from_pretrained(model_id)

image = Image.open("example_ecg.png").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Interpret this ECG and provide a structured triage report."}
        ]
    }
]

prompt = processor.apply_chat_template(messages, add_generation_prompt=True)

inputs = processor(image, prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    use_cache=True
)

result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)
````

## Training Details

* **Framework**: Unsloth + TRL SFTTrainer
* **Hardware**: Google Colab Pro (L4)
* **Batch Size**: 2
* **Epochs**: 1
* **Learning Rate**: 2e-4
* **Scheduler**: Cosine
* **Loss**: CrossEntropy
* **Precision**: bfloat16

## Dataset

The training dataset is a curated subset of the [PULSE-ECG/ECGInstruct](https://huggingface.co/datasets/PULSE-ECG/ECGInstruct) dataset, reformatted for VLM instruction tuning.

* 3,272 samples of ECG image + structured instruction + clinical output
* Focused on realistic and medically relevant triage cases

Dataset link: [`yasserrmd/pulse-ecg-instruct-subset`](https://huggingface.co/datasets/yasserrmd/pulse-ecg-instruct-subset)



### **Training Loss Summary**

<img src="tl.png" >

The model was fine-tuned over 409 steps using the `pulse-ecg-instruct-subset` dataset. The training loss started above **9.5** and steadily declined to below **0.5**, showing consistent convergence and learning throughout the single epoch. The loss curve demonstrates a stable optimization process without overfitting spikes. The chart below visualizes this progression, highlighting the model’s ability to adapt quickly to the ECG image-to-text task.




## Intended Use

* Emergency triage in offline settings
* On-device ECG assessment
* Integration with medical edge devices (Jetson, Pi, Android)
* Rapid analysis during disaster response

## Limitations

* Not intended to replace licensed medical professionals
* Accuracy may vary depending on image quality
* Model outputs should be reviewed by a clinician before action

## License

This model is licensed under **CC BY 4.0**. You are free to use, modify, and distribute it with attribution.

## Author

Mohamed Yasser
[Hugging Face Profile](https://huggingface.co/yasserrmd)




[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)