Model Card for Model ID

This project provides a speech recognition model (STT - Speech To Text) for the Kabyle language.

Model Description

This model was obtained after 30 Epochs, with the best version saved in the Epoch 20 with 'Val Loss= 0.1513'. The script saves the best model version according to the 'Val Loss' value and after ten times without improvement of this value, the script will stop and keep the last best version.

  • Developed by: A.S.
  • License: MIT

Model Sources

Uses

Check the GitHub repository for more details.

Recommendations

  • Dataset Volume: A minimum of 5,000 audio files with their transcriptions is necessary to obtain a coherent model.
  • Data Format:
    • Use a semicolon (;) as a separator between the audio filename and its transcription
    • Replace any semicolons with spaces in your sentences before integrating them into the transcription file
  • Performance Optimization: Adjust the script parameters according to your GPU capabilities:
    • BATCH_SIZE
    • num_worker
    • prefetch_factor

How to Get Started with the Model

Training Details
Training Data

The pre-trained model best_kabyle_asr_optim.pt was trained on more than 700,000 audio sentences with their textual transcriptions, from Common-voice and Tatoeba. Check the GitHub repository for more details.

Training Procedure

Check the GitHub repository for more details.

Environmental Impact

5.18 kg eq. CO2

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: 48 hours
  • Cloud Provider: Hyperstack Cloud

Technical Specifications

Hardware
  • GPU: A100 80 GB PCIe
  • CPUs: 30
  • RAM: 120 GB
  • Disk: 100 GB
Software
  • NVIDIA driver ≥ 530.xx
  • CUDA Toolkit 12.1 (for cu121)
  • Python 3.8+
  • PyTorch
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support