Flutra
/

whisper-large-v3-turbo-sq-v2

Model card Files Files and versions Community

Flutra commited on Mar 18

Commit

a46a156

·

verified ·

1 Parent(s): bf40ff7

Create README.md

Files changed (1) hide show

README.md +53 -0

README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+# Whisper Large v3 Turbo (Albanian Fine-Tuned) - v2
+This is a fine-tuned version of the Whisper Large v3 Turbo model, optimized for Albanian speech-to-text transcription. It achieves a Word Error Rate (WER) of **6.98%** on a held-out evaluation set.
+## Model Details
+- **Base Model**: `openai/whisper-large-v3-turbo`
+- **Language**: Albanian (`sq`)
+## Training Dataset
+- **Source**: Mozilla Common Voice version 19 (available in HG as Kushtrim/common_voice_19_sq)
+- **Description**: Audio clips ranging from 5-30 seconds, in spoken Albanian.
+## Training Details
+The model was fine-tuned on an NVIDIA A100 GPU (40GB) using the `transformers` library. Below are the key training arguments:
+| Argument                   | Value         | Description                                      |
+|----------------------------|---------------|--------------------------------------------------|
+| `per_device_train_batch_size` | 8          | Training batch size per GPU                      |
+| `per_device_eval_batch_size`  | 2          | Evaluation batch size per GPU                    |
+| `gradient_accumulation_steps` | 1          | Steps to accumulate gradients (effective batch size = 8) |
+| `num_train_epochs`         | 3             | Number of training epochs                        |
+| `learning_rate`            | 1e-5          | Initial learning rate                            |
+| `warmup_steps`             | 300           | Number of warmup steps for learning rate         |
+| `evaluation_strategy`      | "steps"       | Evaluate every `eval_steps` during training      |
+| `eval_steps`               | 250           | Frequency of evaluation (every 250 steps)        |
+| `fp16`                     | True          | Use mixed precision training (16-bit floats)     |
+- Total Steps: ~3,540 (completed 3,500)
+- Hardware: NVIDIA A100 (40GB)
+- Libraries:
+  - transformers==4.38.2
+  - torch==2.2.1
+## Performance
+| Step  | Training Loss | Validation Loss | WER    |
+|-------|---------------|-----------------|--------|
+| 250   | 0.4744        | 0.3991          | 34.03% |
+| 500   | 0.3421        | 0.3426          | 30.42% |
+| 750   | 0.2871        | 0.2808          | 26.09% |
+| 1000  | 0.2401        | 0.2258          | 21.31% |
+| 1250  | 0.1809        | 0.1998          | 19.15% |
+| 1500  | 0.1142        | 0.1827          | 17.33% |
+| 1750  | 0.1051        | 0.1611          | 15.19% |
+| 2000  | 0.0930        | 0.1464          | 13.82% |
+| 2250  | 0.0827        | 0.1313          | 11.79% |
+| 2500  | 0.0420        | 0.1139          | 10.50% |
+| 2750  | 0.0330        | 0.1124          | 9.58%  |
+| 3000  | 0.0255        | 0.1006          | 8.38%  |
+| 3250  | 0.0256        | 0.0905          | 7.48%  |
+| 3500  | 0.0204        | 0.0889          | 6.98%  |
+- **Final WER**: **6.98%** (at step 3500)