Saiteja commited on
Commit
62c3ee3
·
verified ·
1 Parent(s): 2cee856

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -1,4 +1,11 @@
1
  ---
 
 
 
 
 
 
 
2
  # Telugu Tokenizer
3
 
4
  A Unigram tokenizer specifically trained for the Telugu language using a large corpus of Telugu text from Wikipedia and news sources. This tokenizer is designed to efficiently handle Telugu text while maintaining high compression ratios.
 
1
  ---
2
+ language: te
3
+ tags:
4
+ - telugu
5
+ - tokenizer
6
+ - bpe
7
+ license: mit
8
+ ---
9
  # Telugu Tokenizer
10
 
11
  A Unigram tokenizer specifically trained for the Telugu language using a large corpus of Telugu text from Wikipedia and news sources. This tokenizer is designed to efficiently handle Telugu text while maintaining high compression ratios.