Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Catherine Arnett's picture
6 7 16

Catherine Arnett

catherinearnett
Erland's profile picture m-ric's profile picture laurievb's profile picture
·
https://catherinearnett.github.io/
  • linguist_cat
  • catherinearnett
  • catherinearnett.bsky.social

AI & ML interests

multilingual NLP, tokenization

Recent Activity

updated a dataset 15 minutes ago
catherinearnett/old_church_slavonic
published a dataset 19 minutes ago
catherinearnett/old_church_slavonic
updated a collection about 11 hours ago
Low Resource Language Datasets
View all activity

Organizations

Blog-explorers's profile picture Language and Cognition Lab (UCSD)'s profile picture Common Crawl Foundation's profile picture Beetles's profile picture

catherinearnett 's datasets 8

catherinearnett/bilingual_tokenizers

Updated 2 minutes ago • 143

catherinearnett/old_church_slavonic

Viewer • Updated 15 minutes ago • 257k

catherinearnett/gheg_albanian

Viewer • Updated about 11 hours ago • 3.12k

catherinearnett/classical_armenian_pd

Viewer • Updated 14 days ago • 102 • 64

catherinearnett/bilingual-tokenizer-training-data

Viewer • Updated Feb 21 • 30.7M • 966

catherinearnett/montok

Updated Sep 19, 2025 • 6.96k • 3

catherinearnett/morphscore

Viewer • Updated Jul 10, 2025 • 5.09M • 201 • 4

catherinearnett/monolingual-tokenizer-data

Viewer • Updated May 15, 2025 • 139M • 434 • 1
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs