Whisper Small Chinese Base

This model is a fine-tuned version of openai/whisper-small on the MAGICDATA Mandarin Chinese Conversational Speech Corpus dataset. It achieves the following results on the evaluation set:

Loss: 0.4830
CER: 20.50


Model Description & Intended Uses

This model is trained on a 180-hour conversational speech dataset, making it suitable for scenarios such as voice assistance.
Also, the training arguments ensure good generalization ability and prevent overfitting, with the model achieving its best performance around the second training epoch.

该模型在180小时的会话语音数据集上进行训练,适用于语音助手等场景。
此外,该模型的训练参数保证了良好的泛化能力并避免过拟合,最佳表现大约出现在第二个训练周期左右。


Limitations

A disclaimer: this model still has a relatively small number of parameters, meaning its performance is difficult to compete with larger ASR models. Additionally, since the base model Whisper Small is multilingual, it is unrealistic to expect this fine-tuned version to surpass native Chinese-based ASR models in Chinese transcription accuracy.

不过需要说明的是,该模型参数量仍然较小,因此性能难以与更大型的自动语音识别(ASR)模型竞争。
另外,作为基模型,Whisper Small 是一个多语言模型,因此不太可能指望该微调版本在中文转录能力上超过专门针对中文的 ASR 模型。


Training Hyperparameters

The following hyperparameters were used during training:

Hyperparameter Value
learning_rate 5e-6
train_batch_size 1
gradient_accumulation_steps 16
eval_batch_size 3
warmup_steps 600
weight_decay 0.01
max_steps 36000
gradient_checkpointing False
eval_strategy steps
save_steps 3000
eval_steps 3000
logging_steps 100
load_best_model_at_end True
metric_for_best_model CER
greater_is_better False
report_to TensorBoard
dataloader_pin_memory False (not CUDA, so pin memory off)

Model Configuration

The following dropout hyperparameters were set in the model configuration:

Parameter Value
dropout 0.2
attention_dropout 0.2
activation_dropout 0.2

Training Results

Epoch Validation Loss CER (%)
0.26 0.4443 23.11
0.52 0.4358 22.27
0.79 0.4367 22.53
1.05 0.4733 22.55
1.31 0.4493 21.67
1.57 0.4595 21.57
1.84 0.4632 21.56
2.10 0.4830 20.50
2.36 0.4676 21.02
2.62 0.4820 22.66
2.89 0.4846 21.34
3.15 0.4976 21.50

Framework Versions

Library Version
Transformers 4.53.3
PyTorch 2.7.1
Datasets 4.0.0
Tokenizers 0.21.2
Downloads last month
60
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AntiPollo/whisper-small-zh

Finetuned
(2829)
this model