Konthee/whisper-th-large-v3-meeting-transcription

Model Card: n-order/whisper-th-large-v3-meeting-transcription

Model Details

Model name: Konthee/whisper-th-large-v3-meeting-transcription
Base model: biodatlab/whisper-th-large-v3-combined
Fine-tuned on: meeting_transcription_audio_th dataset
License: Apache 2.0
Model weights license: CC-BY-SA 4.0 (reflecting dataset license)
Paper / Reference: This model is a fine-tuned version of the Whisper TH Large v3 model tailored for Thai meeting transcription scenarios.

Model Description

This model is based on the Whisper TH Large v3 architecture, originally developed for general-purpose speech recognition in Thai. It has been further fine-tuned on the meeting_transcription_audio_th dataset, which consists of Thai online meeting recordings and gold-standard transcripts from the 2025-ASR competition. The fine-tuning process focused on multi-speaker and acoustically challenging scenarios (noise, reverberation, overlapping speech) to improve performance in meeting transcription tasks.

Intended Uses & Applications

Meeting transcription: Providing accurate transcripts of Thai-language online meetings.
Note-taking automation: Assisting users in generating meeting notes from audio.
Accessibility: Enabling transcription services for hearing-impaired participants in Thai meetings.

Out-of-Scope Use Cases

Transcription of languages other than Thai.
Real-time transcription in extremely low-latency applications (this model is optimized for batch processing).
Highly specialized domains with vocabulary far outside general meeting contexts (e.g., medical diagnostics).

Training Data

Dataset: meeting_transcription_audio_th
A Thai online-meeting speech corpus with real and augmented recordings (noise, reverb, overlapping speech). Originally from the 2025-ASR competition; reformatted and packaged for Hugging Face by Konthee Bo.
Data size: ~32.5 hours total (20 h train, 2.5 h val, 10 h test).
Preprocessing: Audio normalized to 16 kHz WAV; transcripts cleaned for punctuation and speaker turns; metadata added for easier loading via datasets.

Usage Examples

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio

processor = WhisperProcessor.from_pretrained("Konthee/whisper-th-large-v3-meeting-transcription")
model = WhisperForConditionalGeneration.from_pretrained("Konthee/whisper-th-large-v3-meeting-transcription")

# load audio
speech_array, sampling_rate = torchaudio.load("meeting.wav")
inputs = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt")

# generate transcription
generated_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)

Evaluation Results

Results retrieved from the AI Benchmark 2025 ASR Leaderboard https://benchmark.ai.in.th/score/leaderboard/2025-asr

Split	WER (%)
public	13.51
private	18.70

Data sourced directly from the leaderboard metrics

This model corresponds to team 220_อย่าคับ เจนมันเวิ่นเว้อป่าวว, which secured 1st place on both the public and private leaderboards in the 2025-ASR competition

APA

AI Thailand Benchmark Programs. (2025). 2025-ASR: Automatic Speech Recognition Task. Retrieved June 23, 2025, from https://benchmark.ai.in.th/task/detail/2025-asr

@misc{meeting_transcription_audio_2025,
  title        = {meeting_transcription_audio_th: A Thai Online-Meeting Speech Corpus for Multi-Speaker ASR},
  author       = {AI Thailand Benchmark Programs and Konthee Bo},
  year         = {2025},
  howpublished = {https://huggingface.co/datasets/Konthee/meeting_transcription_audio_th},
  note         = {Dataset reformatted and packaged by Konthee Bo; original data from the 2025-ASR competition},
  license      = {CC-BY-SA 4.0}

Authors

Konthee Boonmeeprakob ([email protected])
Pitikorn Khlaisamniang ([email protected])

Konthee
/

whisper-th-large-v3-meeting-transcription