Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

lowhipa-base-asc

This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of openai/whisper-base on a subset (1k samples) of the Arabic Speech Corpus (https://en.arabicspeechcorpus.com) with custom IPA transcriptions transliterated from the provided Buckwalter transcriptions; ASC-IPA dataset available at https://doi.org/10.5281/zenodo.17111977.

Model description

For deployment and description, please refer to https://github.com/jshrdt/whipa.

from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
from peft import PeftModel

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base", task="transcribe")
tokenizer.add_special_tokens({"additional_special_tokens": ["<|ip|>"] + tokenizer.all_special_tokens})

base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
base_model.generation_config.lang_to_id["<|ip|>"] = tokenizer.convert_tokens_to_ids(["<|ip|>"])[0]
base_model.resize_token_embeddings(len(tokenizer))

whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-base-asc")

whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"

whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-base", task="transcribe")

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5071 2.0 126 0.4070
0.2359 4.0 252 0.2963
0.149 6.0 378 0.2626
0.1051 8.0 504 0.2578
0.0811 10.0 630 0.2584

Framework versions

  • PEFT 0.15.1
  • Transformers 4.48.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • PEFT 0.15.1
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jshrdt/lowhipa-base-asc

Adapter
(38)
this model

Dataset used to train jshrdt/lowhipa-base-asc

Collection including jshrdt/lowhipa-base-asc