view reply Really cool post! In particular this was eye-opening to me: However, I would consider both Unicode and UTF-8 to be tokenizers.
view article Article There is no such thing as a tokenizer-free lunch By catherinearnett • Sep 25 • 84
view article Article An Analysis of Multilingual Models on Hugging Face By catherinearnett and 1 other • Sep 18 • 4