# Whisper Large v3 Turbo (Albanian Fine-Tuned) - v2 This is a fine-tuned version of the Whisper Large v3 Turbo model, optimized for Albanian speech-to-text transcription. It achieves a Word Error Rate (WER) of **6.98%** on a held-out evaluation set. ## Model Details - **Base Model**: `openai/whisper-large-v3-turbo` - **Language**: Albanian (`sq`) ## Training Dataset - **Source**: Mozilla Common Voice version 19 (available in HF as **Kushtrim/common_voice_19_sq**) - **Description**: Audio clips ranging from 5-30 seconds, in spoken Albanian. ## Training Details The model was fine-tuned on an NVIDIA A100 GPU (40GB) using the `transformers` library. Below are the key training arguments: | Argument | Value | Description | |----------------------------|---------------|--------------------------------------------------| | `per_device_train_batch_size` | 8 | Training batch size per GPU | | `per_device_eval_batch_size` | 2 | Evaluation batch size per GPU | | `gradient_accumulation_steps` | 1 | Steps to accumulate gradients (effective batch size = 8) | | `num_train_epochs` | 3 | Number of training epochs | | `learning_rate` | 1e-5 | Initial learning rate | | `warmup_steps` | 300 | Number of warmup steps for learning rate | | `evaluation_strategy` | "steps" | Evaluate every `eval_steps` during training | | `eval_steps` | 250 | Frequency of evaluation (every 250 steps) | | `fp16` | True | Use mixed precision training (16-bit floats) | - Total Steps: ~3,540 (completed 3,500) - Hardware: NVIDIA A100 (40GB) - Libraries: - transformers==4.38.2 - torch==2.2.1 ## Performance | Step | Training Loss | Validation Loss | WER | |-------|---------------|-----------------|--------| | 250 | 0.4744 | 0.3991 | 34.03% | | 500 | 0.3421 | 0.3426 | 30.42% | | 750 | 0.2871 | 0.2808 | 26.09% | | 1000 | 0.2401 | 0.2258 | 21.31% | | 1250 | 0.1809 | 0.1998 | 19.15% | | 1500 | 0.1142 | 0.1827 | 17.33% | | 1750 | 0.1051 | 0.1611 | 15.19% | | 2000 | 0.0930 | 0.1464 | 13.82% | | 2250 | 0.0827 | 0.1313 | 11.79% | | 2500 | 0.0420 | 0.1139 | 10.50% | | 2750 | 0.0330 | 0.1124 | 9.58% | | 3000 | 0.0255 | 0.1006 | 8.38% | | 3250 | 0.0256 | 0.0905 | 7.48% | | 3500 | 0.0204 | 0.0889 | 6.98% | - **Final WER**: **6.98%** (at step 3500)