dgx1_whisper_base_mozilla_noisy_teacher_distil_epochs_50_batch_16

This model is a fine-tuned version of rohitp1/dgx1_whisper_base_finetune_teacher_babble_noise_mozilla_100_epochs_batch_16 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0845
  • Wer: 33.7337

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 256
  • total_train_batch_size: 4096
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.1214 4.41 150 0.5017 32.6283
0.1796 8.82 300 0.7222 33.1175
0.3176 13.23 450 0.8749 33.3612
0.3986 17.64 600 0.9770 33.5457
0.4497 22.06 750 1.0385 33.7111
0.4816 26.47 900 1.0845 33.7337

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.12.1
  • Datasets 2.8.0
  • Tokenizers 0.13.2
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support