Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -1,4 +1,11 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
# Telugu Tokenizer
|
3 |
|
4 |
A Unigram tokenizer specifically trained for the Telugu language using a large corpus of Telugu text from Wikipedia and news sources. This tokenizer is designed to efficiently handle Telugu text while maintaining high compression ratios.
|
|
|
1 |
---
|
2 |
+
language: te
|
3 |
+
tags:
|
4 |
+
- telugu
|
5 |
+
- tokenizer
|
6 |
+
- bpe
|
7 |
+
license: mit
|
8 |
+
---
|
9 |
# Telugu Tokenizer
|
10 |
|
11 |
A Unigram tokenizer specifically trained for the Telugu language using a large corpus of Telugu text from Wikipedia and news sources. This tokenizer is designed to efficiently handle Telugu text while maintaining high compression ratios.
|