Soloni TDT-CTC 114M Bambara

Model architecture | Model size | Language

soloba-ctc-0.6b-v0 is a fine tuned version of nvidia/parakeet-ctc-0.6b on RobotsMali/kunkado and RobotsMali/bam-asr-early. This model cannot does produce Capitalizations but not Punctuations. The model was fine-tuned using NVIDIA NeMo.

The model doesn't tag code swicthed expressions in its transcription since for training this model we decided to treat them as a modern variant of the Bambara Language removing all tags and markages.

🚨 Important Note

This model, along with its associated resources, is part of an ongoing research effort, improvements and refinements are expected in future versions. A human evaluation report of the model is coming soon. Users should be aware that:

  • The model may not generalize very well accross all speaking conditions and dialects.
  • Community feedback is welcome, and contributions are encouraged to refine the model further.

NVIDIA NeMo: Training

To fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version.

pip install nemo_toolkit['asr']

How to Use This Model

Note that this model has been released for research purposes primarily.

Load Model with NeMo

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="RobotsMali/soloba-ctc-0.6b-v0")

Transcribe Audio

model.eval()
# Assuming you have a test audio file named sample_audio.wav
asr_model.transcribe(['sample_audio.wav'])

Input

This model accepts any mono-channel audio (wav files) as input and resamples them to 16 kHz sample rate before performing the forward pass

Output

This model provides transcribed speech as a string for a given speech sample and return an Hypothesis object (under nemo>=2.3)

Model Architecture

This model uses a FastConformer Ecoder and a CTC decoder. FastConformer is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: Fast-Conformer Model. And a Convolutional Neural Net with CTC loss, the Connectionist Temporal Classification decoder

Training

The NeMo toolkit (version 2.3.0) was used for finetuning this model for 183,086 steps over nvidia/parakeet-ctc-0.6b model. This version is trained with this base config. The full training configurations, scripts, and experimental logs are available here:

πŸ”— Bambara-ASR Experiments

The tokenizers for these models were built using the text transcripts of the train set with this script.

Dataset

This model was fine-tuned on the kunkado dataset, the semi-labelled subset, which consists of ~120 hours of automatically annotated Bambara speech data, and the bam-asr-early dataset.

Performance

We report the Word Error Rate on the test set of bam-asr-early.

Decoder (Version) Tokenizer Vocabulary Size bam-asr-early
v0 BPE 512 35.16

License

This model is released under the CC-BY-4.0 license. By using this model, you agree to the terms of the license.


Feel free to open a discussion on Hugging Face or file an issue on github if you have any contributions


Downloads last month
120
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RobotsMali/soloba-ctc-0.6b-v0

Finetuned
(3)
this model
Finetunes
1 model

Datasets used to train RobotsMali/soloba-ctc-0.6b-v0

Evaluation results