aqvaylis/Kabyle_Speech_Recognition_ASR

Model Card for Model ID

This project provides a speech recognition model (STT - Speech To Text) for the Kabyle language.

Model Description

This model was obtained after 30 Epochs, with the best version saved in the Epoch 20 with 'Val Loss= 0.1513'. The script saves the best model version according to the 'Val Loss' value and after ten times without improvement of this value, the script will stop and keep the last best version.

Developed by: A.S.
License: MIT

Model Sources

Repository: https://github.com/aqvaylis/kabyle-speech-to-text
Demo: https://huggingface.co/spaces/aqvaylis/kabyle-speech-to-text

Uses

Check the GitHub repository for more details.

Recommendations

Dataset Volume: A minimum of 5,000 audio files with their transcriptions is necessary to obtain a coherent model.
Data Format:
- Use a semicolon (;) as a separator between the audio filename and its transcription
- Replace any semicolons with spaces in your sentences before integrating them into the transcription file
Performance Optimization: Adjust the script parameters according to your GPU capabilities:
- BATCH_SIZE
- num_worker
- prefetch_factor

How to Get Started with the Model

Training Details

Training Data

The pre-trained model best_kabyle_asr_optim.pt was trained on more than 700,000 audio sentences with their textual transcriptions, from Common-voice and Tatoeba. Check the GitHub repository for more details.

Training Procedure

Check the GitHub repository for more details.

Environmental Impact

5.18 kg eq. CO2

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: 48 hours
Cloud Provider: Hyperstack Cloud

Technical Specifications

Hardware

GPU: A100 80 GB PCIe
CPUs: 30
RAM: 120 GB
Disk: 100 GB

Software

NVIDIA driver ≥ 530.xx
CUDA Toolkit 12.1 (for cu121)
Python 3.8+
PyTorch