langminer
/

wav2vec2-custom-asr

Automatic Speech Recognition

Model card Files Files and versions Community

wav2vec2-custom-asr / README.md

langminer's picture

Update README.md

6ae7cb7 verified about 1 month ago

|

history blame contribute delete

2.6 kB

	---
	library_name: transformers
	license: mit
	metrics:
	- name: wer
	type: wer
	value: 17.26
	base_model:
	- mesolitica/wav2vec2-xls-r-300m-mixed
	pipeline_tag: automatic-speech-recognition
	tags:
	- wav2vec2
	- asr
	- automatic-speech-recognition
	- malay
	- english
	- speech
	---

	# Model Card for Malay-English Fine-Tuned ASR Model

	This model was fine-tuned on approximately 50 hours of manually curated Malay-English code-switched audio data for 10 epochs. It achieves a Word Error Rate (WER) of 17.26% on a held-out evaluation set vs 34.29% with base model.

	## Model Details

	### Model Description

	This is a fine-tuned version of the `mesolitica/wav2vec2-xls-r-300m-mixed` model on a custom Malay-English dataset. It is designed to transcribe speech that includes both Malay and English, especially in informal or conversational contexts where code-switching is common.

	- Developed by: mysterio
	- Model type: CTC-based automatic speech recognition
	- Languages: Malay, English
	- License: MIT
	- Fine-tuned from: https://huggingface.co/mesolitica/wav2vec2-xls-r-300m-mixed

	### Model Sources

	- Base Model: [mesolitica/wav2vec2-xls-r-300m-mixed](https://huggingface.co/mesolitica/wav2vec2-xls-r-300m-mixed)

	## Uses

	### Direct Use

	This model can be used to transcribe conversational Malay-English audio recordings, especially in domains such as:
	- Broadcast interviews
	- YouTube vlogs
	- Podcasts
	- Community recordings

	### Downstream Use

	The model can be fine-tuned further or used as part of downstream applications such as:
	- Real-time transcription services
	- Voice assistants tailored for Malaysian users
	- Speech-driven translation systems

	### Out-of-Scope Use

	- High-stakes transcription scenarios (e.g., legal or medical contexts) where exact word accuracy is critical
	- Non-Malay, non-English languages
	- Noisy or far-field audio environments (unless fine-tuned further)

	## Bias, Risks, and Limitations

	### Known Limitations

	- May underperform on accents or dialects not well-represented in training data
	- Inconsistent casing or punctuation handling (model is CTC-based)
	- Limited robustness to background noise or overlapping speakers

	### Recommendations

	- Always verify outputs for critical tasks
	- Pair with punctuation restoration or diarization for production-grade use
	- Retrain with domain-specific data for higher accuracy

	## How to Get Started with the Model

	```python
	from transformers import pipeline

	asr = pipeline("automatic-speech-recognition", model="langminer/wav2vec2-custom-asr")
	transcription = asr("your_audio_file.wav")
	print(transcription)