Greek GPT-2

Model Details

  • Base Model: Greek GPT-2 (124M parameters)
  • Fine-tuning Method: Low-Rank Adaptation (LoRA)
  • LoRA Configuration: Applied to attention (c_attn) and projection (c_proj) modules
    • Rank: 16
    • Alpha: 32
  • Trainable Parameters: 1,622,016 (≈1.29% of total)
  • Optimizer: AdamW
  • Training Epochs: 30
  • Learning Rate: 5e-5
  • Batch Size: 16 (with gradient accumulation steps = 2)

LoRA enabled efficient domain adaptation by reducing the number of updated parameters while maintaining strong performance.


Intended Use

This model is designed for Greek medical text processing and ASR error correction in the medical domain.
Primary applications include:

  • Ranking candidate transcriptions produced by Whisper for higher accuracy
  • Domain-specific language modeling for Greek medical texts

Training Data

The model was fine-tuned on a custom Greek Medical Text Dataset (dataset link), containing 20,430 samples compiled from three sources:

  1. Medical E-books: Rich in clinical terminology covering diagnostics, procedures, and patient care.
  2. QTLP Greek CC Corpus (Medical domain): A diverse web-sourced corpus including reference texts, news articles, discussions, and commercial medical content.
  3. Istorima Podcast Dialogues: Transcribed podcast dialogues introducing informal, conversational medical language.

This mixture allowed the model to learn both formal medical terminology and idiomatic spoken Greek patterns relevant to ASR.


Evaluation

Final Perplexity Results

Dataset Pre-trained GPT-2 Fine-tuned GPT-2 Improvement (%)
Medical Texts 45.73 35.36 22.7
Speech Transcriptions 103.21 67.67 34.4
Combined (All Data) 53.15 39.86 25.0

Training Dynamics

Validation perplexity steadily decreased across epochs, indicating improved predictive accuracy.

Epoch Training Loss Validation Loss Perplexity
1 3.95 4.20 44.99
5 3.89 4.11 42.03
10 3.81 4.05 40.22
15 3.83 4.03 39.29
20 3.77 4.01 38.70
25 3.77 4.00 38.33
30 3.78 3.99 38.22

How to Use

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load tokenizer
lm_tokenizer = AutoTokenizer.from_pretrained("Vardis/Medical_Speech_Greek_GPT2")

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "lighteternal/gpt2-finetuned-greek",
    torch_dtype=torch.float16, 
    device_map="auto"
)

# Load LoRA weights
lm_model = PeftModel.from_pretrained(base_model, "Vardis/Medical_Speech_Greek_GPT2").to(device)

# Example inference
input_text = "Ο ασθενής παρουσιάζει συμπτώματα"
inputs = lm_tokenizer(input_text, return_tensors="pt").to(device)
outputs = lm_model.generate(**inputs, max_length=50)
print(lm_tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vardis/Medical_Speech_Greek_GPT2

Finetuned
(4)
this model

Space using Vardis/Medical_Speech_Greek_GPT2 1