|
--- |
|
datasets: |
|
- phonemetransformers/IPA-CHILDES |
|
language: |
|
- zh |
|
- nl |
|
- en |
|
- et |
|
- fr |
|
- de |
|
- id |
|
- sr |
|
- es |
|
- ja |
|
--- |
|
|
|
# IPA CHILDES Models |
|
|
|
Phoneme-based GPT-2 models trained on the largest 11 sections of the [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) dataset for our paper [IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling](https://arxiv.org/abs/2504.03036). |
|
|
|
All models have 5M non-embedding parameters and were trained on 1.8M tokens from their language. These models were then probed for phonetic features using the corresponding inventories in [Phoible](https://phoible.org/). Check out the paper for more details. Training and analysis scripts can be found [here](https://github.com/codebyzeb/PhonemeTransformers). |
|
|
|
To load a model: |
|
```python |
|
from transformers import AutoModel |
|
dutch_model = AutoModel.from_pretrained('phonemetransformers/ipa-childes-models', subfolder='Dutch') |
|
``` |