Jakir057 commited on
Commit
abeb08c
·
verified ·
1 Parent(s): 32b485a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -56
README.md CHANGED
@@ -19,21 +19,8 @@ BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects </
19
  </div>
20
 
21
  **BRDialect** - ASR system is trained on ten regional dialects of Bangladesh using the <a href="https://www.kaggle.com/competitions/ben10">Ben10</a> dataset from Bengali.AI.
22
- <!-- APT-Eval is the first and largest dataset to evaluate the AI-text detectors behavior for AI-polished texts.
23
- It contains almost **15K** text samples, polished by 5 different LLMs, for 6 different domains, with 2 major polishing types. All of these samples initially came from purely human written texts.
24
- It not only includes AI-polished texts, but also includes fine-grained involvement of AI/LLM.
25
- It is designed to push the boundary of AI-text detectors, for the scenarios where human uses LLM to minimally polish their own written texts. -->
26
 
27
- <!-- The overview of our dataset is given below --
28
-
29
- | **Polish Type** | **GPT-4o** | **Llama3.1-70B** | **Llama3-8B** | **Llama2-7B** | **DeepSeek-V3** | **Total** |
30
- |-------------------------------------------|------------|------------------|---------------|---------------|-- |-----------|
31
- | **no-polish / pure HWT** | - | - | - | - | - | 300 |
32
- | **Degree-based** | 1152 | 1085 | 1125 | 744 | 1141 | 4406 |
33
- | **Percentage-based** | 2072 | 2048 | 1977 | 1282 | 2078 | 7379 |
34
- | **Total** | 3224 | 3133 | 3102 | 2026 | 3219 | **15004** | -->
35
-
36
- ## Load the model
37
 
38
  **Prerequisite**<br>
39
  ```
@@ -43,20 +30,20 @@ It is designed to push the boundary of AI-text detectors, for the scenarios wher
43
  ```
44
 
45
  **Log in to HuggingFace**<br>
46
- ```
47
  from huggingface_hub import login
48
  login("TOKEN")
49
  ```
50
 
51
  **Load base model and BRDialect**<br>
52
- ```
53
  ## BRDialect
54
  from huggingface_hub import hf_hub_download
55
 
56
  kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa")
57
  state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth")
58
  ```
59
- ```
60
  from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM
61
  import torch
62
  import numpy as np
@@ -84,7 +71,7 @@ model.eval()
84
  ```
85
 
86
  ## Transcription Generation
87
- ```
88
  sampling_rate = 16000
89
  path = "AUDIO_PATH"
90
  frame, sr = librosa.load(path, sr=sampling_rate, mono=True)
@@ -105,44 +92,6 @@ text = result.text
105
  print(f"Transcription={text}")
106
  ```
107
 
108
- <!-- ## Load the dataset
109
-
110
- To load the dataset, install the library `datasets` with `pip install datasets`. Then,
111
- ```
112
- from datasets import load_dataset
113
- apt_eval_dataset = load_dataset("smksaha/apt-eval")
114
- ```
115
-
116
- If you also want to access the original human written text samples, use this
117
- ```
118
- from datasets import load_dataset
119
- dataset = load_dataset("smksaha/apt-eval", data_files={
120
- "test": "merged_apt_eval_dataset.csv",
121
- "original": "original.csv"
122
- })
123
- ``` -->
124
- <!--
125
- ## Data fields
126
- The RAID dataset has the following fields
127
-
128
- ```
129
- 1. `id`: A id that uniquely identifies each sample
130
- 2. `polish_type`: The type of polishing that was used to generate this text sample
131
- - Choices: `['degree-based', 'percentage-based']`
132
- 3. `polishing_degree`: The degree of polishing that was used by the polisher to generate this text sample
133
- - Choices: `["extreme_minor", "minor", "slight_major", "major"]`
134
- 4. `polishing_percent`: The percetnage of original text was prompted to the polisher to generate this text sample
135
- - Choices: `["1", "5", "10", "20", "35", "50", "75"]`
136
- 5. `polisher`: The LLMs were used as polisher
137
- - Choices: `["DeepSeek-V3", "GPT-4o", "Llama3.1-70B", "Llama3-8B", "Llama2-7B"]`
138
- 6. `domain`: The genre from where the original human written text was taken
139
- - Choices: `['blog', 'email_content', 'game_review', 'news', 'paper_abstract', 'speech']`
140
- 7. `generation`: The text of the generation
141
- 8. `sem_similarity`: The semantic similarity between polished text and original human written text
142
- 9. `levenshtein_distance`: The levenshtein distance between polished text and original human written text
143
- 10. `jaccard_distance`: The jaccard distance between polished text and original human written text
144
- ``` -->
145
-
146
  ## Citation
147
 
148
  ```
@@ -152,4 +101,14 @@ The RAID dataset has the following fields
152
  journal={arXiv preprint arXiv:2510.06188},
153
  year={2025}
154
  }
 
 
 
 
 
 
 
 
 
 
155
  ```
 
19
  </div>
20
 
21
  **BRDialect** - ASR system is trained on ten regional dialects of Bangladesh using the <a href="https://www.kaggle.com/competitions/ben10">Ben10</a> dataset from Bengali.AI.
 
 
 
 
22
 
23
+ ## Load the BRDialect ASR System
 
 
 
 
 
 
 
 
 
24
 
25
  **Prerequisite**<br>
26
  ```
 
30
  ```
31
 
32
  **Log in to HuggingFace**<br>
33
+ ```python
34
  from huggingface_hub import login
35
  login("TOKEN")
36
  ```
37
 
38
  **Load base model and BRDialect**<br>
39
+ ```python
40
  ## BRDialect
41
  from huggingface_hub import hf_hub_download
42
 
43
  kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa")
44
  state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth")
45
  ```
46
+ ```python
47
  from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM
48
  import torch
49
  import numpy as np
 
71
  ```
72
 
73
  ## Transcription Generation
74
+ ```python
75
  sampling_rate = 16000
76
  path = "AUDIO_PATH"
77
  frame, sr = librosa.load(path, sr=sampling_rate, mono=True)
 
92
  print(f"Transcription={text}")
93
  ```
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ## Citation
96
 
97
  ```
 
101
  journal={arXiv preprint arXiv:2510.06188},
102
  year={2025}
103
  }
104
+
105
+ @inproceedings{javed2022towards,
106
+ title={Towards building asr systems for the next billion users},
107
+ author={Javed, Tahir and Doddapaneni, Sumanth and Raman, Abhigyan and Bhogale, Kaushal Santosh and Ramesh, Gowtham and Kunchukuttan, Anoop and Kumar, Pratyush and Khapra, Mitesh M},
108
+ booktitle={Proceedings of the aaai conference on artificial intelligence},
109
+ volume={36},
110
+ number={10},
111
+ pages={10813--10821},
112
+ year={2022}
113
+ }
114
  ```