IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling
Paper
•
2504.03036
•
Published
The IPA-CHILDES dataset along with the models and tokenizers used for phoneme-based language modeling for the 31 languages in CHILDES.
Note Tokenizers for each of the 31 languages in IPA-CHILDES.
Note Models trained on 11 languages in IPA-CHILDES.
Note 108 models trained on the EnglishNA portion of IPA-CHILDES to establish scaling behaviours of phoneme LMs.