Turkish BPE Tokenizer
Model Description
This is a Byte-Pair Encoding (BPE) tokenizer trained specifically for Turkish language. It's designed to be used in Turkish language model pretraining and NLP tasks.
Key Features:
- Trained on a diverse Turkish text corpus
- 50,000 vocabulary size
- min_frequency = 2
- Special handling for Turkish characters (ğ, ü, ş, ı, ö, ç)
- Added special tokens:[[EOS],[SEP],[UNK], [MASK],[PAD]]
Intended Uses & Limitations
How to Use
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("abakirci/admbkrc-turkish-tokenizer")
text = "Bu bir Türkçe örnek cümledir."
encoded = tokenizer(text, return_tensors="pt")
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support