Jakir057
/

BRDialect

Automatic Speech Recognition

Bengali

Model card Files Files and versions

xet

Community

Jakir057 commited on 9 days ago

Commit

abeb08c

verified ·

1 Parent(s): 32b485a

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -56

README.md CHANGED Viewed

@@ -19,21 +19,8 @@ BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects </
 </div>
 **BRDialect** - ASR system is trained on ten regional dialects of Bangladesh using the <a href="https://www.kaggle.com/competitions/ben10">Ben10</a> dataset from Bengali.AI.
-<!-- APT-Eval is the first and largest dataset to evaluate the AI-text detectors behavior for AI-polished texts.
-It contains almost **15K** text samples, polished by 5 different LLMs, for 6 different domains, with 2 major polishing types. All of these samples initially came from purely human written texts.
-It not only includes AI-polished texts, but also includes fine-grained involvement of AI/LLM.
-It is designed to push the boundary of AI-text detectors, for the scenarios where human uses LLM to minimally polish their own written texts. -->
-<!-- The overview of our dataset is given below --
-| **Polish Type**                           | **GPT-4o** | **Llama3.1-70B** | **Llama3-8B** | **Llama2-7B** | **DeepSeek-V3** | **Total** |
-|-------------------------------------------|------------|------------------|---------------|---------------|-- |-----------|
-| **no-polish / pure HWT**                  | -          | -                | -             | -             | - | 300       |
-| **Degree-based**                          | 1152       | 1085             | 1125          | 744           | 1141 | 4406      |
-| **Percentage-based**                      | 2072       | 2048             | 1977          | 1282          | 2078 | 7379      |
-| **Total**                                 | 3224       | 3133             | 3102          | 2026          | 3219 | **15004** | -->
-## Load the model
 **Prerequisite**<br>
 ```
@@ -43,20 +30,20 @@ It is designed to push the boundary of AI-text detectors, for the scenarios wher
 ```
 **Log in to HuggingFace**<br>
-```
 from huggingface_hub import login
 login("TOKEN")
 ```
 **Load base model and BRDialect**<br>
-```
 ## BRDialect
 from huggingface_hub import hf_hub_download
 kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa")
 state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth")
 ```
-```
 from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM
 import torch
 import numpy as np
@@ -84,7 +71,7 @@ model.eval()
 ```
 ## Transcription Generation
-```
 sampling_rate = 16000
 path = "AUDIO_PATH"
 frame, sr = librosa.load(path, sr=sampling_rate, mono=True)
@@ -105,44 +92,6 @@ text = result.text
 print(f"Transcription={text}")
 ```
-<!-- ## Load the dataset
-To load the dataset, install the library `datasets` with `pip install datasets`. Then,
-```
-from datasets import load_dataset
-apt_eval_dataset = load_dataset("smksaha/apt-eval")
-```
-If you also want to access the original human written text samples, use this
-```
-from datasets import load_dataset
-dataset = load_dataset("smksaha/apt-eval", data_files={
-    "test": "merged_apt_eval_dataset.csv",
-    "original": "original.csv"
-})
-```  -->
-<!--
-## Data fields
-The RAID dataset has the following fields
-```
-1. `id`: A id that uniquely identifies each sample
-2. `polish_type`: The type of polishing that was used to generate this text sample
-    - Choices: `['degree-based', 'percentage-based']`
-3. `polishing_degree`: The degree of polishing that was used by the polisher to generate this text sample
-    - Choices: `["extreme_minor", "minor", "slight_major", "major"]`
-4. `polishing_percent`: The percetnage of original text was prompted to the polisher to generate this text sample
-    - Choices: `["1", "5", "10", "20", "35", "50", "75"]`
-5. `polisher`: The LLMs were used as polisher
-    - Choices: `["DeepSeek-V3", "GPT-4o", "Llama3.1-70B", "Llama3-8B", "Llama2-7B"]`
-6. `domain`: The genre from where the original human written text was taken
-    - Choices: `['blog', 'email_content', 'game_review', 'news', 'paper_abstract', 'speech']`
-7. `generation`: The text of the generation
-8. `sem_similarity`: The semantic similarity between polished text and original human written text
-9. `levenshtein_distance`: The levenshtein distance between polished text and original human written text
-10. `jaccard_distance`: The jaccard distance between polished text and original human written text
-``` -->
 ## Citation
 ```
@@ -152,4 +101,14 @@ The RAID dataset has the following fields
   journal={arXiv preprint arXiv:2510.06188},
   year={2025}
 }
 ```

 </div>
 **BRDialect** - ASR system is trained on ten regional dialects of Bangladesh using the <a href="https://www.kaggle.com/competitions/ben10">Ben10</a> dataset from Bengali.AI.
+## Load the BRDialect ASR System
 **Prerequisite**<br>
 ```
 ```
 **Log in to HuggingFace**<br>
+```python
 from huggingface_hub import login
 login("TOKEN")
 ```
 **Load base model and BRDialect**<br>
+```python
 ## BRDialect
 from huggingface_hub import hf_hub_download
 kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa")
 state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth")
 ```
+```python
 from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM
 import torch
 import numpy as np
 ```
 ## Transcription Generation
+```python
 sampling_rate = 16000
 path = "AUDIO_PATH"
 frame, sr = librosa.load(path, sr=sampling_rate, mono=True)
 print(f"Transcription={text}")
 ```
 ## Citation
 ```
   journal={arXiv preprint arXiv:2510.06188},
   year={2025}
 }
+@inproceedings{javed2022towards,
+  title={Towards building asr systems for the next billion users},
+  author={Javed, Tahir and Doddapaneni, Sumanth and Raman, Abhigyan and Bhogale, Kaushal Santosh and Ramesh, Gowtham and Kunchukuttan, Anoop and Kumar, Pratyush and Khapra, Mitesh M},
+  booktitle={Proceedings of the aaai conference on artificial intelligence},
+  volume={36},
+  number={10},
+  pages={10813--10821},
+  year={2022}
+}
 ```