Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Whisper Large v3 Turbo (Albanian Fine-Tuned) - v2
|
2 |
+
|
3 |
+
This is a fine-tuned version of the Whisper Large v3 Turbo model, optimized for Albanian speech-to-text transcription. It achieves a Word Error Rate (WER) of **6.98%** on a held-out evaluation set.
|
4 |
+
|
5 |
+
## Model Details
|
6 |
+
- **Base Model**: `openai/whisper-large-v3-turbo`
|
7 |
+
- **Language**: Albanian (`sq`)
|
8 |
+
|
9 |
+
## Training Dataset
|
10 |
+
- **Source**: Mozilla Common Voice version 19 (available in HG as Kushtrim/common_voice_19_sq)
|
11 |
+
- **Description**: Audio clips ranging from 5-30 seconds, in spoken Albanian.
|
12 |
+
|
13 |
+
## Training Details
|
14 |
+
The model was fine-tuned on an NVIDIA A100 GPU (40GB) using the `transformers` library. Below are the key training arguments:
|
15 |
+
|
16 |
+
| Argument | Value | Description |
|
17 |
+
|----------------------------|---------------|--------------------------------------------------|
|
18 |
+
| `per_device_train_batch_size` | 8 | Training batch size per GPU |
|
19 |
+
| `per_device_eval_batch_size` | 2 | Evaluation batch size per GPU |
|
20 |
+
| `gradient_accumulation_steps` | 1 | Steps to accumulate gradients (effective batch size = 8) |
|
21 |
+
| `num_train_epochs` | 3 | Number of training epochs |
|
22 |
+
| `learning_rate` | 1e-5 | Initial learning rate |
|
23 |
+
| `warmup_steps` | 300 | Number of warmup steps for learning rate |
|
24 |
+
| `evaluation_strategy` | "steps" | Evaluate every `eval_steps` during training |
|
25 |
+
| `eval_steps` | 250 | Frequency of evaluation (every 250 steps) |
|
26 |
+
| `fp16` | True | Use mixed precision training (16-bit floats) |
|
27 |
+
|
28 |
+
- Total Steps: ~3,540 (completed 3,500)
|
29 |
+
- Hardware: NVIDIA A100 (40GB)
|
30 |
+
- Libraries:
|
31 |
+
- transformers==4.38.2
|
32 |
+
- torch==2.2.1
|
33 |
+
|
34 |
+
## Performance
|
35 |
+
|
36 |
+
| Step | Training Loss | Validation Loss | WER |
|
37 |
+
|-------|---------------|-----------------|--------|
|
38 |
+
| 250 | 0.4744 | 0.3991 | 34.03% |
|
39 |
+
| 500 | 0.3421 | 0.3426 | 30.42% |
|
40 |
+
| 750 | 0.2871 | 0.2808 | 26.09% |
|
41 |
+
| 1000 | 0.2401 | 0.2258 | 21.31% |
|
42 |
+
| 1250 | 0.1809 | 0.1998 | 19.15% |
|
43 |
+
| 1500 | 0.1142 | 0.1827 | 17.33% |
|
44 |
+
| 1750 | 0.1051 | 0.1611 | 15.19% |
|
45 |
+
| 2000 | 0.0930 | 0.1464 | 13.82% |
|
46 |
+
| 2250 | 0.0827 | 0.1313 | 11.79% |
|
47 |
+
| 2500 | 0.0420 | 0.1139 | 10.50% |
|
48 |
+
| 2750 | 0.0330 | 0.1124 | 9.58% |
|
49 |
+
| 3000 | 0.0255 | 0.1006 | 8.38% |
|
50 |
+
| 3250 | 0.0256 | 0.0905 | 7.48% |
|
51 |
+
| 3500 | 0.0204 | 0.0889 | 6.98% |
|
52 |
+
|
53 |
+
- **Final WER**: **6.98%** (at step 3500)
|