whipa-base-cv

This Whisper-for-IPA (WhIPA) model is a fine-tuned version of openai/whisper-base on a subset of the CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions. It (Ckpt4) achieves the following results on the evaluation set:

Train Loss: 0.7632
Validation Loss: 1.0754
Lvnshtn: 19.1200
Cer: 0.5521
Cer Norm: 0.4570
Ped: 17.9029
Per: 0.5793
Pfer: 18.4631
Levenshtein phone distance (multipa): 6.4574
Weighted feature edit distance (Panphon): 63.6318
Weighted feature edit rate (Panphon): 1.6310

Model description

For deployment and description, please refer to https://github.com/jshrdt/whipa.

from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor

whipa_model = WhisperForConditionalGeneration.from_pretrained("jshrdt/whipa-base-cv/checkpoint-440") # recommended checkpoint

whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"

whipa_tokenizer = WhisperTokenizer.from_pretrained("jshrdt/whipa-base-cv", task="transcribe")
whipa_processor = WhisperProcessor.from_pretrained("jshrdt/whipa-base-cv", task="transcribe")

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 64
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 110
training_steps: 1100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Lvnshtn	Cer	Cer Norm	Ped	Per	Pfer	Lhpd (mipa)	Wefed	Wefer	Time
1.342	1.1	220	1.3713	21.4400	0.6240	0.5221	19.6229	0.6381	20.6903	7.1986	65.5925	1.8090	5664.7295
0.7632	3.1	440	1.0754	19.1200	0.5521	0.4570	17.9029	0.5793	18.4631	6.4574	63.6318	1.6310	11416.7465
0.6067	5.1	660	0.9581	21.0886	0.6204	0.4421	19.2886	0.6486	22.4988	7.5708	80.1193	1.7244	17295.4013
0.529	7.1	880	0.9254	19.2086	0.5804	0.4241	18.0886	0.6197	21.9187	7.2060	75.0221	1.6756	23140.0079
0.4973	9.1	1100	0.9085	23.9829	0.7639	0.4420	22.3029	0.8247	31.1954	9.3198	105.6689	1.8891	28951.3640