Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

lowhipa-base-thchs30

This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of openai/whisper-base on a subset (1k samples) of the Mandarin THCHS-30 database (https://arxiv.org/pdf/1512.01882) with IPA transcriptions by Taubert (2023, https://zenodo.org/records/7528596).

Model description

For deployment and description, please refer to https://github.com/jshrdt/whipa.

from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
from peft import PeftModel

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base", task="transcribe")
tokenizer.add_special_tokens({"additional_special_tokens": ["<|ip|>"] + tokenizer.all_special_tokens})

base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
base_model.generation_config.lang_to_id["<|ip|>"] = tokenizer.convert_tokens_to_ids(["<|ip|>"])[0]
base_model.resize_token_embeddings(len(tokenizer))

whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-base-thchs30")

whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"

whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-base", task="transcribe")

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • lr_scheduler_warmup_steps: 100
  • training_steps: 630
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.7877 2.0323 126 0.5588
0.3438 4.0645 252 0.3379
0.2765 6.0968 378 0.3056
0.2425 8.1290 504 0.2966
0.2195 10.1613 630 0.2911

Framework versions

  • PEFT 0.15.1
  • Transformers 4.48.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jshrdt/lowhipa-base-thchs30

Adapter
(38)
this model

Collection including jshrdt/lowhipa-base-thchs30