Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,9 @@ language:
|
|
33 |
|
34 |
# CHILDES IPA Tokenizers
|
35 |
|
36 |
-
Tokenizers for each language in [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) used to train cross-lingual phoneme LLMs in our
|
|
|
|
|
37 |
|
38 |
Scripts for creating the tokenizers can be found [here](https://github.com/codebyzeb/childes-processor).
|
39 |
Scripts for training models using these tokenizers can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
|
|
|
33 |
|
34 |
# CHILDES IPA Tokenizers
|
35 |
|
36 |
+
Tokenizers for each language in [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) used to train cross-lingual phoneme LLMs in our papers:
|
37 |
+
- [IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling](https://arxiv.org/abs/2504.03036)
|
38 |
+
- [BabyLM's First Words: Word Segmentation as a Phonological Probing Task](https://arxiv.org/abs/2504.03338)
|
39 |
|
40 |
Scripts for creating the tokenizers can be found [here](https://github.com/codebyzeb/childes-processor).
|
41 |
Scripts for training models using these tokenizers can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
|