Important Note:

This model is for streaming purpose, to reduce false positive results(hallucinations) in silent/background noise
The noise is labeled as "%nz", please remove/replace this pattern in the transcribe result


2025-07-21: CER:

Dataset Lang Split CER(in %)
Training yue validation 3.234
mozilla-foundation/common_voice_17_0 yue test 0.437
mozilla-foundation/common_voice_17_0 en test(2k samples) 5.18
mozilla-foundation/common_voice_16_1 zh-CN test 11.74
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test 9.86
Sunbird/urban-noise-uganda-61k:small(1k) noise half(500) 12.7

2025-07-06: CER:

Dataset Lang Split CER(in %)
Training yue validation 8.95
mozilla-foundation/common_voice_17_0 yue test 8.78
mozilla-foundation/common_voice_16_1 yue test 8.76
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test 8.00
Sunbird/urban-noise-uganda-61k:small(1k) noise half(500) 0.0

Train Args:

per_device_train_batch_size=32,
learning_rate=1e-6,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
per_device_eval_batch_size=16,
generation_max_length=225,

Hardware:
NVIDIA Tesla V100 16GB * 4


A Realtime Streaming application example is built on this model:
https://github.com/JackyHoCL/whisper-realtime.git

FAQ:

  1. If having tokenizer issue during inference, please update your transformers version to >= 4.46.3
pip install --upgrade transformers
Downloads last month
43
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection

Datasets used to train JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection