metadata

base_model: openai/whisper-large-v3
datasets:
  - gl
language: gl
library_name: transformers
license: apache-2.0
model-index:
  - name: Finetuned openai/whisper-large-v3 on Galician
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech-to-Text
        dataset:
          name: Common Voice (Galician)
          type: common_voice
        metrics:
          - type: wer
            value: 5.143

Finetuned penai/whisper-large-v3 on 116954 Galician training audio samples from cv-corpus-21.0-2025-03-14/gl.

This model was created from the Mozilla.ai Blueprint: speech-to-text-finetune.

Evaluation results on 29239 audio samples of Galician:

Baseline model (before finetuning) on Galician

Word Error Rate (Normalized): 20.140
Word Error Rate (Orthographic): 25.293
Character Error Rate (Normalized): 7.427
Character Error Rate (Orthographic): 6.224
Loss: 1.905

Finetuned model (after finetuning) on Galician

Word Error Rate (Normalized): 5.143
Word Error Rate (Orthographic): 8.320
Character Error Rate (Normalized): 1.865
Character Error Rate (Orthographic): 2.446
Loss: 0.126 """

Finetuned model (after finetuning) on the Galician FLEURS test set (total of 927 samples)

Word Error Rate (Normalized): 9.804
Word Error Rate (Orthographic): 13.147
Character Error Rate (Normalized): 5.827
Character Error Rate (Orthographic): 5.007
Loss: 0.383