Whisper Small Chinese Base

This model is a fine-tuned version of openai/whisper-small on the MAGICDATA Mandarin Chinese Conversational Speech Corpus dataset. It achieves the following results on the evaluation set:

Loss: 0.4830
CER: 20.50

Model Description & Intended Uses

This model is trained on a 180-hour conversational speech dataset, making it suitable for scenarios such as voice assistance.
Also, the training arguments ensure good generalization ability and prevent overfitting, with the model achieving its best performance around the second training epoch.

该模型在180小时的会话语音数据集上进行训练，适用于语音助手等场景。
此外，该模型的训练参数保证了良好的泛化能力并避免过拟合，最佳表现大约出现在第二个训练周期左右。

Limitations

A disclaimer: this model still has a relatively small number of parameters, meaning its performance is difficult to compete with larger ASR models. Additionally, since the base model Whisper Small is multilingual, it is unrealistic to expect this fine-tuned version to surpass native Chinese-based ASR models in Chinese transcription accuracy.

不过需要说明的是，该模型参数量仍然较小，因此性能难以与更大型的自动语音识别（ASR）模型竞争。
另外，作为基模型，Whisper Small 是一个多语言模型，因此不太可能指望该微调版本在中文转录能力上超过专门针对中文的 ASR 模型。

Training Hyperparameters

The following hyperparameters were used during training:

Hyperparameter	Value
learning_rate	5e-6
train_batch_size	1
gradient_accumulation_steps	16
eval_batch_size	3
warmup_steps	600
weight_decay	0.01
max_steps	36000
gradient_checkpointing	False
eval_strategy	steps
save_steps	3000
eval_steps	3000
logging_steps	100
load_best_model_at_end	True
metric_for_best_model	CER
greater_is_better	False
report_to	TensorBoard
dataloader_pin_memory	False (not CUDA, so pin memory off)

Model Configuration

The following dropout hyperparameters were set in the model configuration:

Parameter	Value
dropout	0.2
attention_dropout	0.2
activation_dropout	0.2

Training Results

Epoch	Validation Loss	CER (%)
0.26	0.4443	23.11
0.52	0.4358	22.27
0.79	0.4367	22.53
1.05	0.4733	22.55
1.31	0.4493	21.67
1.57	0.4595	21.57
1.84	0.4632	21.56
2.10	0.4830	20.50
2.36	0.4676	21.02
2.62	0.4820	22.66
2.89	0.4846	21.34
3.15	0.4976	21.50

Framework Versions

Library	Version
Transformers	4.53.3
PyTorch	2.7.1
Datasets	4.0.0
Tokenizers	0.21.2

AntiPollo
/

whisper-small-zh