MahtaFetrat nielsr HF Staff commited on
Commit
c1227dd
·
verified ·
1 Parent(s): 06a7dcb

Add library name, change pipeline tag (#1)

Browse files

- Add library name, change pipeline tag (84ffe5908bd3918b4bed0e411e97377105a976b4)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -1,4 +1,8 @@
1
  ---
 
 
 
 
2
  license: mit
3
  tags:
4
  - g2p
@@ -10,17 +14,15 @@ tags:
10
  - farsi
11
  - phonemization
12
  - homograph-disambiguation
13
- datasets:
14
- - MahtaFetrat/HomoRich-G2P-Persian
15
- language:
16
- - fa
17
  ---
18
 
19
  # Homo-GE2PE: Persian Grapheme-to-Phoneme Conversion with Homograph Disambiguation
20
 
21
  ![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Model-orange)
22
 
23
- **Homo-GE2PE** is a Persian grapheme-to-phoneme (G2P) model specialized in homograph disambiguation—words with identical spellings but context-dependent pronunciations (e.g., *مرد* pronounced as *mard* "man" or *mord* "died"). Introduced in *[Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models](link)*, the model extends **GE2PE** by fine-tuning it on the **HomoRich** dataset, explicitly designed for such pronunciation challenges.
24
 
25
  ---
26
 
@@ -132,4 +134,4 @@ Contributions and pull requests are welcome. Please open an issue to discuss the
132
  * [Base GE2PE Model](https://github.com/Sharif-SLPL/GE2PE)
133
  * [HomoRich Dataset (Huggingface)](https://huggingface.co/datasets/MahtaFetrat/HomoRich-G2P-Persian)
134
  * [HomoRich Dataset (Github)](https://github.com/MahtaFetrat/HomoRich-G2P-Persian)
135
- * [SentenceBench Persian G2P Benchmark](https://huggingface.co/datasets/MahtaFetrat/SentenceBench)
 
1
  ---
2
+ datasets:
3
+ - MahtaFetrat/HomoRich-G2P-Persian
4
+ language:
5
+ - fa
6
  license: mit
7
  tags:
8
  - g2p
 
14
  - farsi
15
  - phonemization
16
  - homograph-disambiguation
17
+ library_name: transformers
18
+ pipeline_tag: text-to-speech
 
 
19
  ---
20
 
21
  # Homo-GE2PE: Persian Grapheme-to-Phoneme Conversion with Homograph Disambiguation
22
 
23
  ![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Model-orange)
24
 
25
+ **Homo-GE2PE** is a Persian grapheme-to-phoneme (G2P) model specialized in homograph disambiguation—words with identical spellings but context-dependent pronunciations (e.g., *مرد* pronounced as *mard* "man" or *mord* "died"). Introduced in *[Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models](https://huggingface.co/papers/2505.12973)*, the model extends **GE2PE** by fine-tuning it on the **HomoRich** dataset, explicitly designed for such pronunciation challenges.
26
 
27
  ---
28
 
 
134
  * [Base GE2PE Model](https://github.com/Sharif-SLPL/GE2PE)
135
  * [HomoRich Dataset (Huggingface)](https://huggingface.co/datasets/MahtaFetrat/HomoRich-G2P-Persian)
136
  * [HomoRich Dataset (Github)](https://github.com/MahtaFetrat/HomoRich-G2P-Persian)
137
+ * [SentenceBench Persian G2P Benchmark](https://huggingface.co/datasets/MahtaFetrat/SentenceBench)