Greek GPT-2
Model Details
- Base Model: Greek GPT-2 (124M parameters)
- Fine-tuning Method: Low-Rank Adaptation (LoRA)
- LoRA Configuration: Applied to attention (
c_attn
) and projection (c_proj
) modules- Rank: 16
- Alpha: 32
- Trainable Parameters: 1,622,016 (≈1.29% of total)
- Optimizer: AdamW
- Training Epochs: 30
- Learning Rate: 5e-5
- Batch Size: 16 (with gradient accumulation steps = 2)
LoRA enabled efficient domain adaptation by reducing the number of updated parameters while maintaining strong performance.
Intended Use
This model is designed for Greek medical text processing and ASR error correction in the medical domain.
Primary applications include:
- Ranking candidate transcriptions produced by Whisper for higher accuracy
- Domain-specific language modeling for Greek medical texts
Training Data
The model was fine-tuned on a custom Greek Medical Text Dataset (dataset link), containing 20,430 samples compiled from three sources:
- Medical E-books: Rich in clinical terminology covering diagnostics, procedures, and patient care.
- QTLP Greek CC Corpus (Medical domain): A diverse web-sourced corpus including reference texts, news articles, discussions, and commercial medical content.
- Istorima Podcast Dialogues: Transcribed podcast dialogues introducing informal, conversational medical language.
This mixture allowed the model to learn both formal medical terminology and idiomatic spoken Greek patterns relevant to ASR.
Evaluation
Final Perplexity Results
Dataset | Pre-trained GPT-2 | Fine-tuned GPT-2 | Improvement (%) |
---|---|---|---|
Medical Texts | 45.73 | 35.36 | 22.7 |
Speech Transcriptions | 103.21 | 67.67 | 34.4 |
Combined (All Data) | 53.15 | 39.86 | 25.0 |
Training Dynamics
Validation perplexity steadily decreased across epochs, indicating improved predictive accuracy.
Epoch | Training Loss | Validation Loss | Perplexity |
---|---|---|---|
1 | 3.95 | 4.20 | 44.99 |
5 | 3.89 | 4.11 | 42.03 |
10 | 3.81 | 4.05 | 40.22 |
15 | 3.83 | 4.03 | 39.29 |
20 | 3.77 | 4.01 | 38.70 |
25 | 3.77 | 4.00 | 38.33 |
30 | 3.78 | 3.99 | 38.22 |
How to Use
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load tokenizer
lm_tokenizer = AutoTokenizer.from_pretrained("Vardis/Medical_Speech_Greek_GPT2")
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"lighteternal/gpt2-finetuned-greek",
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA weights
lm_model = PeftModel.from_pretrained(base_model, "Vardis/Medical_Speech_Greek_GPT2").to(device)
# Example inference
input_text = "Ο ασθενής παρουσιάζει συμπτώματα"
inputs = lm_tokenizer(input_text, return_tensors="pt").to(device)
outputs = lm_model.generate(**inputs, max_length=50)
print(lm_tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Vardis/Medical_Speech_Greek_GPT2
Base model
lighteternal/gpt2-finetuned-greek