Model Card for Model ID
This project provides a speech recognition model (STT - Speech To Text) for the Kabyle language.
Model Description
This model was obtained after 30 Epochs, with the best version saved in the Epoch 20 with 'Val Loss= 0.1513'. The script saves the best model version according to the 'Val Loss' value and after ten times without improvement of this value, the script will stop and keep the last best version.
- Developed by: A.S.
- License: MIT
Model Sources
- Repository: https://github.com/aqvaylis/kabyle-speech-to-text
- Demo: https://huggingface.co/spaces/aqvaylis/kabyle-speech-to-text
Uses
Check the GitHub repository for more details.
Recommendations
- Dataset Volume: A minimum of 5,000 audio files with their transcriptions is necessary to obtain a coherent model.
- Data Format:
- Use a semicolon (;) as a separator between the audio filename and its transcription
- Replace any semicolons with spaces in your sentences before integrating them into the transcription file
- Performance Optimization: Adjust the script parameters according to your GPU capabilities:
- BATCH_SIZE
- num_worker
- prefetch_factor
How to Get Started with the Model
Training Details
Training Data
The pre-trained model best_kabyle_asr_optim.pt was trained on more than 700,000 audio sentences with their textual transcriptions, from Common-voice and Tatoeba. Check the GitHub repository for more details.
Training Procedure
Check the GitHub repository for more details.
Environmental Impact
5.18 kg eq. CO2
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: 48 hours
- Cloud Provider: Hyperstack Cloud
Technical Specifications
Hardware
- GPU: A100 80 GB PCIe
- CPUs: 30
- RAM: 120 GB
- Disk: 100 GB
Software
- NVIDIA driver ≥ 530.xx
- CUDA Toolkit 12.1 (for cu121)
- Python 3.8+
- PyTorch