Add library name, change pipeline tag (#1)
Browse files- Add library name, change pipeline tag (84ffe5908bd3918b4bed0e411e97377105a976b4)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,4 +1,8 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
license: mit
|
3 |
tags:
|
4 |
- g2p
|
@@ -10,17 +14,15 @@ tags:
|
|
10 |
- farsi
|
11 |
- phonemization
|
12 |
- homograph-disambiguation
|
13 |
-
|
14 |
-
|
15 |
-
language:
|
16 |
-
- fa
|
17 |
---
|
18 |
|
19 |
# Homo-GE2PE: Persian Grapheme-to-Phoneme Conversion with Homograph Disambiguation
|
20 |
|
21 |

|
22 |
|
23 |
-
**Homo-GE2PE** is a Persian grapheme-to-phoneme (G2P) model specialized in homograph disambiguation—words with identical spellings but context-dependent pronunciations (e.g., *مرد* pronounced as *mard* "man" or *mord* "died"). Introduced in *[Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models](
|
24 |
|
25 |
---
|
26 |
|
@@ -132,4 +134,4 @@ Contributions and pull requests are welcome. Please open an issue to discuss the
|
|
132 |
* [Base GE2PE Model](https://github.com/Sharif-SLPL/GE2PE)
|
133 |
* [HomoRich Dataset (Huggingface)](https://huggingface.co/datasets/MahtaFetrat/HomoRich-G2P-Persian)
|
134 |
* [HomoRich Dataset (Github)](https://github.com/MahtaFetrat/HomoRich-G2P-Persian)
|
135 |
-
* [SentenceBench Persian G2P Benchmark](https://huggingface.co/datasets/MahtaFetrat/SentenceBench)
|
|
|
1 |
---
|
2 |
+
datasets:
|
3 |
+
- MahtaFetrat/HomoRich-G2P-Persian
|
4 |
+
language:
|
5 |
+
- fa
|
6 |
license: mit
|
7 |
tags:
|
8 |
- g2p
|
|
|
14 |
- farsi
|
15 |
- phonemization
|
16 |
- homograph-disambiguation
|
17 |
+
library_name: transformers
|
18 |
+
pipeline_tag: text-to-speech
|
|
|
|
|
19 |
---
|
20 |
|
21 |
# Homo-GE2PE: Persian Grapheme-to-Phoneme Conversion with Homograph Disambiguation
|
22 |
|
23 |

|
24 |
|
25 |
+
**Homo-GE2PE** is a Persian grapheme-to-phoneme (G2P) model specialized in homograph disambiguation—words with identical spellings but context-dependent pronunciations (e.g., *مرد* pronounced as *mard* "man" or *mord* "died"). Introduced in *[Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models](https://huggingface.co/papers/2505.12973)*, the model extends **GE2PE** by fine-tuning it on the **HomoRich** dataset, explicitly designed for such pronunciation challenges.
|
26 |
|
27 |
---
|
28 |
|
|
|
134 |
* [Base GE2PE Model](https://github.com/Sharif-SLPL/GE2PE)
|
135 |
* [HomoRich Dataset (Huggingface)](https://huggingface.co/datasets/MahtaFetrat/HomoRich-G2P-Persian)
|
136 |
* [HomoRich Dataset (Github)](https://github.com/MahtaFetrat/HomoRich-G2P-Persian)
|
137 |
+
* [SentenceBench Persian G2P Benchmark](https://huggingface.co/datasets/MahtaFetrat/SentenceBench)
|