phonemetransformers
/

ipa-childes-models-small

Model card Files Files and versions Community

codebyzeb commited on about 1 month ago

Commit

aa7c4cc

·

verified ·

1 Parent(s): a93858a

Create README.md

Files changed (1) hide show

README.md +31 -0

README.md ADDED Viewed

	@@ -0,0 +1,31 @@

+---
+datasets:
+- phonemetransformers/IPA-CHILDES
+language:
+- zh
+- nl
+- en
+- et
+- fr
+- de
+- id
+- sr
+- es
+- ja
+- it
+- ko
+- pl
+- pt
+- sv
+---
+# IPA CHILDES Models: Small
+Phoneme-based GPT-2 models trained on the largest 17 sections of the [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) dataset for the paper [BabyLM's First Words: Word Segmentation as a Phonological Probing Task]().
+The models have 800k non-embedding parameters and were trained on 700k tokens of their language. They were evaluated for phonological knowledge using the *word segmentation* task. Check out the paper for more details. Training and analysis scripts can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
+To load a model:
+```python
+from transformers import AutoModel
+french_model = AutoModel.from_pretrained('phonemetransformers/ipa-childes-models-medium', subfolder='French')
+```