(lo)whipa-models
Collection
Full and PEFT LoRA (LoWhIPA) fine-tuned Whisper-base and Whisper-large-v2 models for language-agnostic IPA transcription of speech.
•
14 items
•
Updated
This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of openai/whisper-large-v2 on a subset (1k samples) of the Mandarin THCHS-30 database (https://arxiv.org/pdf/1512.01882) with IPA transcriptions by Taubert (2023, https://zenodo.org/records/7528596).
For deployment and description, please refer to https://github.com/jshrdt/whipa.
from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
from peft import PeftModel
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base", task="transcribe")
tokenizer.add_special_tokens({"additional_special_tokens": ["<|ip|>"] + tokenizer.all_special_tokens})
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
base_model.generation_config.lang_to_id["<|ip|>"] = tokenizer.convert_tokens_to_ids(["<|ip|>"])[0]
base_model.resize_token_embeddings(len(tokenizer))
whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-large-thchs30")
whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"
whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2", task="transcribe")
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.369 | 2.0323 | 126 | 0.2990573048591614 |
0.2183 | 4.0645 | 252 | 0.24794502556324005 |
0.1622 | 6.0968 | 378 | 0.253131628036499 |
0.1124 | 8.1290 | 504 | 0.2732747197151184 |
0.0692 | 10.1613 | 630 | 0.2962268590927124 |
Base model
openai/whisper-large-v2