metadata

datasets:
  - phonemetransformers/IPA-CHILDES
language:
  - zh
  - nl
  - en
  - et
  - fr
  - de
  - id
  - sr
  - es
  - ja
base_model:
  - openai-community/gpt2

IPA CHILDES Models

GPT-2 models trained on 11 of the languages in IPA-CHILDES.

All models have 5M non-embedding parameters and were trained on 1.8M tokens from their language. These models were then probed for phonetic features using the corresponding inventories in Phoible. Check out the paper for more details. Training and analysis scripts can be found here.