whipa-base-cv

This Whisper-for-IPA (WhIPA) model is a fine-tuned version of openai/whisper-base on a subset of the CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions. It (Ckpt4) achieves the following results on the evaluation set:

  • Train Loss: 0.7632
  • Validation Loss: 1.0754
  • Lvnshtn: 19.1200
  • Cer: 0.5521
  • Cer Norm: 0.4570
  • Ped: 17.9029
  • Per: 0.5793
  • Pfer: 18.4631
  • Levenshtein phone distance (multipa): 6.4574
  • Weighted feature edit distance (Panphon): 63.6318
  • Weighted feature edit rate (Panphon): 1.6310

Model description

For deployment and description, please refer to https://github.com/jshrdt/whipa.

from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor

whipa_model = WhisperForConditionalGeneration.from_pretrained("jshrdt/whipa-base-cv/checkpoint-440") # recommended checkpoint

whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"

whipa_tokenizer = WhisperTokenizer.from_pretrained("jshrdt/whipa-base-cv", task="transcribe")
whipa_processor = WhisperProcessor.from_pretrained("jshrdt/whipa-base-cv", task="transcribe")

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 64
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 110
  • training_steps: 1100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Lvnshtn Cer Cer Norm Ped Per Pfer Lhpd (mipa) Wefed Wefer Time
1.342 1.1 220 1.3713 21.4400 0.6240 0.5221 19.6229 0.6381 20.6903 7.1986 65.5925 1.8090 5664.7295
0.7632 3.1 440 1.0754 19.1200 0.5521 0.4570 17.9029 0.5793 18.4631 6.4574 63.6318 1.6310 11416.7465
0.6067 5.1 660 0.9581 21.0886 0.6204 0.4421 19.2886 0.6486 22.4988 7.5708 80.1193 1.7244 17295.4013
0.529 7.1 880 0.9254 19.2086 0.5804 0.4241 18.0886 0.6197 21.9187 7.2060 75.0221 1.6756 23140.0079
0.4973 9.1 1100 0.9085 23.9829 0.7639 0.4420 22.3029 0.8247 31.1954 9.3198 105.6689 1.8891 28951.3640

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
136
Safetensors
Model size
72.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jshrdt/whipa-base-cv

Finetuned
(551)
this model

Dataset used to train jshrdt/whipa-base-cv

Collection including jshrdt/whipa-base-cv