File size: 4,524 Bytes
893a62c 965e5ab 893a62c a62206f 965e5ab a62206f 893a62c d5f2075 a62206f 6fd28d4 a62206f 893a62c 965e5ab |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
---
base_model:
- google/gemma-3n-E2B-it
tags:
- text-generation-inference
- transformers
- unsloth
- gemma3n
- medical
- vision-language
- gemma
- ecg
- cardiology
- healthcare
license: cc-by-4.0
datasets:
- yasserrmd/pulse-ecg-instruct-subset
language:
- en
---
# GemmaECG-Vision
<img src="GemmaECG Vision_ Future of Health.png" width="800" />
`GemmaECG-Vision` is a fine-tuned vision-language model built on `google/gemma-3n-e2b`, designed for ECG image interpretation tasks. The model accepts a medical ECG image along with a clinical instruction prompt and generates a structured analysis suitable for triage or documentation use cases.
This model was developed using **Unsloth** for efficient fine-tuning and supports **image + text** inputs with medical task-specific prompt formatting. It is designed to run in **offline or edge environments**, enabling healthcare triage in resource-constrained settings.
## Model Objective
To assist healthcare professionals and emergency responders by providing AI-generated ECG analysis directly from medical images, without requiring internet access or cloud resources.
## Usage
This model expects:
- An ECG image (`PIL.Image`)
- A textual instruction such as:
```
You are a clinical assistant specialized in ECG interpretation. Given an ECG image, generate a concise, structured, and medically accurate report.
Use this exact format:
Rhythm:
PR Interval:
QRS Duration:
Axis:
Bundle Branch Blocks:
Atrial Abnormalities:
Ventricular Hypertrophy:
Q Wave or QS Complexes:
T Wave Abnormalities:
ST Segment Changes:
Final Impression:
````
### Inference Example (Python)
```python
from transformers import AutoProcessor, Gemma3nForConditionalGeneration
from PIL import Image
import torch
model_id = "yasserrmd/GemmaECG-Vision"
model = Gemma3nForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval().to("cuda")
processor = AutoProcessor.from_pretrained(model_id)
image = Image.open("example_ecg.png").convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Interpret this ECG and provide a structured triage report."}
]
}
]
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=1.0,
top_p=0.95,
top_k=64,
use_cache=True
)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)
````
## Training Details
* **Framework**: Unsloth + TRL SFTTrainer
* **Hardware**: Google Colab Pro (L4)
* **Batch Size**: 2
* **Epochs**: 1
* **Learning Rate**: 2e-4
* **Scheduler**: Cosine
* **Loss**: CrossEntropy
* **Precision**: bfloat16
## Dataset
The training dataset is a curated subset of the [PULSE-ECG/ECGInstruct](https://huggingface.co/datasets/PULSE-ECG/ECGInstruct) dataset, reformatted for VLM instruction tuning.
* 3,272 samples of ECG image + structured instruction + clinical output
* Focused on realistic and medically relevant triage cases
Dataset link: [`yasserrmd/pulse-ecg-instruct-subset`](https://huggingface.co/datasets/yasserrmd/pulse-ecg-instruct-subset)
### **Training Loss Summary**
<img src="tl.png" >
The model was fine-tuned over 409 steps using the `pulse-ecg-instruct-subset` dataset. The training loss started above **9.5** and steadily declined to below **0.5**, showing consistent convergence and learning throughout the single epoch. The loss curve demonstrates a stable optimization process without overfitting spikes. The chart below visualizes this progression, highlighting the model’s ability to adapt quickly to the ECG image-to-text task.
## Intended Use
* Emergency triage in offline settings
* On-device ECG assessment
* Integration with medical edge devices (Jetson, Pi, Android)
* Rapid analysis during disaster response
## Limitations
* Not intended to replace licensed medical professionals
* Accuracy may vary depending on image quality
* Model outputs should be reviewed by a clinician before action
## License
This model is licensed under **CC BY 4.0**. You are free to use, modify, and distribute it with attribution.
## Author
Mohamed Yasser
[Hugging Face Profile](https://huggingface.co/yasserrmd)
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |