(lo)whipa-models
Collection
Full and PEFT LoRA (LoWhIPA) fine-tuned Whisper-base and Whisper-large-v2 models for language-agnostic IPA transcription of speech.
•
14 items
•
Updated
This Whisper-for-IPA (WhIPA) model is a fine-tuned version of openai/whisper-base on a subset of the CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions. It (Ckpt4) achieves the following results on the evaluation set:
For deployment and description, please refer to https://github.com/jshrdt/whipa.
from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
whipa_model = WhisperForConditionalGeneration.from_pretrained("jshrdt/whipa-base-cv/checkpoint-440") # recommended checkpoint
whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"
whipa_tokenizer = WhisperTokenizer.from_pretrained("jshrdt/whipa-base-cv", task="transcribe")
whipa_processor = WhisperProcessor.from_pretrained("jshrdt/whipa-base-cv", task="transcribe")
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Lvnshtn | Cer | Cer Norm | Ped | Per | Pfer | Lhpd (mipa) | Wefed | Wefer | Time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.342 | 1.1 | 220 | 1.3713 | 21.4400 | 0.6240 | 0.5221 | 19.6229 | 0.6381 | 20.6903 | 7.1986 | 65.5925 | 1.8090 | 5664.7295 |
0.7632 | 3.1 | 440 | 1.0754 | 19.1200 | 0.5521 | 0.4570 | 17.9029 | 0.5793 | 18.4631 | 6.4574 | 63.6318 | 1.6310 | 11416.7465 |
0.6067 | 5.1 | 660 | 0.9581 | 21.0886 | 0.6204 | 0.4421 | 19.2886 | 0.6486 | 22.4988 | 7.5708 | 80.1193 | 1.7244 | 17295.4013 |
0.529 | 7.1 | 880 | 0.9254 | 19.2086 | 0.5804 | 0.4241 | 18.0886 | 0.6197 | 21.9187 | 7.2060 | 75.0221 | 1.6756 | 23140.0079 |
0.4973 | 9.1 | 1100 | 0.9085 | 23.9829 | 0.7639 | 0.4420 | 22.3029 | 0.8247 | 31.1954 | 9.3198 | 105.6689 | 1.8891 | 28951.3640 |
Base model
openai/whisper-base