Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
6
7
16
Catherine Arnett
catherinearnett
Follow
Erland's profile picture
m-ric's profile picture
laurievb's profile picture
111 followers
·
38 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset
15 minutes ago
catherinearnett/old_church_slavonic
published
a dataset
19 minutes ago
catherinearnett/old_church_slavonic
updated
a collection
about 11 hours ago
Low Resource Language Datasets
View all activity
Organizations
catherinearnett
's datasets
8
Sort: Recently updated
catherinearnett/bilingual_tokenizers
Updated
2 minutes ago
•
143
catherinearnett/old_church_slavonic
Viewer
•
Updated
15 minutes ago
•
257k
catherinearnett/gheg_albanian
Viewer
•
Updated
about 11 hours ago
•
3.12k
catherinearnett/classical_armenian_pd
Viewer
•
Updated
14 days ago
•
102
•
64
catherinearnett/bilingual-tokenizer-training-data
Viewer
•
Updated
Feb 21
•
30.7M
•
966
catherinearnett/montok
Updated
Sep 19, 2025
•
6.96k
•
3
catherinearnett/morphscore
Viewer
•
Updated
Jul 10, 2025
•
5.09M
•
201
•
4
catherinearnett/monolingual-tokenizer-data
Viewer
•
Updated
May 15, 2025
•
139M
•
434
•
1