Jakir057
/

BRDialect

Automatic Speech Recognition

Model card Files Files and versions

BRDialect / README.md

Jakir057's picture

Update README.md

abeb08c verified 15 days ago

|

history blame contribute delete

3.22 kB

	---
	license: apache-2.0
	language:
	- bn
	metrics:
	- wer
	- cer
	base_model:
	- ai4bharat/indicwav2vec_v1_bengali
	pipeline_tag: automatic-speech-recognition
	---


	<div align="center">
	<h1>🚨 BRDialect 🚨

	BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects </h1>
	📝 <a href="https://arxiv.org/abs/2510.06188"><b>Paper</b></a>, 🖥️ <a href="https://github.com/Jak57/BanglaTalk"><b>Github</b></a>
	</div>

	BRDialect - ASR system is trained on ten regional dialects of Bangladesh using the <a href="https://www.kaggle.com/competitions/ben10">Ben10</a> dataset from Bengali.AI.

	## Load the BRDialect ASR System

	Prerequisite<br>
	```
	!pip install -U transformers
	!pip install https://github.com/kpu/kenlm/archive/master.zip
	!pip install pyctcdecode
	```

	Log in to HuggingFace<br>
	```python
	from huggingface_hub import login
	login("TOKEN")
	```

	Load base model and BRDialect<br>
	```python
	## BRDialect
	from huggingface_hub import hf_hub_download

	kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa")
	state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth")
	```
	```python
	from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM
	import torch
	import numpy as np
	import pyctcdecode
	import librosa

	base_model_id = "ai4bharat/indicwav2vec_v1_bengali"
	processor = AutoProcessor.from_pretrained(base_model_id)
	model = AutoModelForCTC.from_pretrained(base_model_id)
	model.load_state_dict(torch.load(state_dict_path)["model"])

	vocab_dict = processor.tokenizer.get_vocab()
	sorted_vocab_dict = {k: v for k, v in sorted(vocab_dict.items(), key=lambda item: item[1])}
	decoder = pyctcdecode.build_ctcdecoder(
	list(sorted_vocab_dict.keys()),
	str(kenlm_model_path)
	)
	processor_with_lm = Wav2Vec2ProcessorWithLM(
	feature_extractor=processor.feature_extractor,
	tokenizer=processor.tokenizer,
	decoder=decoder
	)
	model.freeze_feature_encoder()
	model.eval()
	```

	## Transcription Generation
	```python
	sampling_rate = 16000
	path = "AUDIO_PATH"
	frame, sr = librosa.load(path, sr=sampling_rate, mono=True)

	inputs = processor(
	frame,
	sampling_rate=sampling_rate,
	return_tensors="pt",
	padding=False
	)

	with torch.no_grad():
	logits = model(inputs.input_values.to("cpu")).logits

	np_logits = logits.squeeze(0).cpu().numpy()
	result = processor_with_lm.decode(np_logits, beam_width=256)
	text = result.text
	print(f"Transcription={text}")
	```

	## Citation

	```
	@article{hasan2025banglatalk,
	title={BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects},
	author={Hasan, Jakir and Dipta, Shubhashis Roy},
	journal={arXiv preprint arXiv:2510.06188},
	year={2025}
	}

	@inproceedings{javed2022towards,
	title={Towards building asr systems for the next billion users},
	author={Javed, Tahir and Doddapaneni, Sumanth and Raman, Abhigyan and Bhogale, Kaushal Santosh and Ramesh, Gowtham and Kunchukuttan, Anoop and Kumar, Pratyush and Khapra, Mitesh M},
	booktitle={Proceedings of the aaai conference on artificial intelligence},
	volume={36},
	number={10},
	pages={10813--10821},
	year={2022}
	}
	```