phonemetransformers
/

ipa-childes-models-tiny

Model card Files Files and versions Community

codebyzeb commited on Apr 3

Commit

668b32c

·

verified ·

1 Parent(s): 9e9685b

Create README.md

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+datasets:
+- phonemetransformers/IPA-CHILDES
+language:
+- en
+- eu
+- zh
+- da
+- nl
+- hr
+- es
+- et
+- fa
+- fr
+- de
+- hu
+- is
+- id
+- ga
+- it
+- ja
+- ko
+- pt
+- pl
+- qu
+- ro
+- sr
+- sv
+- tr
+- cy
+- 'no'
+---
+# IPA CHILDES Models: Tiny
+Phoneme-based GPT-2 models trained on all 31 sections of the [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) dataset for the paper [BabyLM's First Words: Word Segmentation as a Phonological Probing Task]().
+The models have 600k non-embedding parameters and were trained on 100k tokens of their language. They were evaluated for phonological knowledge using the *word segmentation* task. Check out the paper for more details. Training and analysis scripts can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
+To load a model:
+```python
+from transformers import AutoModel
+farsi_model = AutoModel.from_pretrained('phonemetransformers/ipa-childes-models-tiny', subfolder='Farsi')
+```