Found Better Dataset for testing
I have identified two issues with https://huggingface.co/datasets/urdu-asr/csalt-voice. First, there are multiple speakers, which sometimes overlap. Second, the audio length is quite long, making it difficult to compare it accurately for the small model. For these reasons, I am switching to a better dataset: https://huggingface.co/datasets/HowMannyMore/urdu-audiodataset. What do you think?
I am running the testing script now; let's see if it improves the results.
It appears that the dataset is simply a duplicate of the Urdu subset of the Mozilla Common Voice dataset.
It has been sourced from Mozilla's Common Voice, a publicly available voice dataset that relies on the contributions of volunteers from various parts of the world.
Yes.
I got this result:
β WER: 1.776%
β CER: 0.811%
β BLEU: 97.636%
β ChrF: 99.146
Just wasted my time.
I am sorry to bother you again.