Safetensors
codebyzeb commited on
Commit
aa7c4cc
·
verified ·
1 Parent(s): a93858a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - phonemetransformers/IPA-CHILDES
4
+ language:
5
+ - zh
6
+ - nl
7
+ - en
8
+ - et
9
+ - fr
10
+ - de
11
+ - id
12
+ - sr
13
+ - es
14
+ - ja
15
+ - it
16
+ - ko
17
+ - pl
18
+ - pt
19
+ - sv
20
+ ---
21
+ # IPA CHILDES Models: Small
22
+
23
+ Phoneme-based GPT-2 models trained on the largest 17 sections of the [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) dataset for the paper [BabyLM's First Words: Word Segmentation as a Phonological Probing Task]().
24
+
25
+ The models have 800k non-embedding parameters and were trained on 700k tokens of their language. They were evaluated for phonological knowledge using the *word segmentation* task. Check out the paper for more details. Training and analysis scripts can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
26
+
27
+ To load a model:
28
+ ```python
29
+ from transformers import AutoModel
30
+ french_model = AutoModel.from_pretrained('phonemetransformers/ipa-childes-models-medium', subfolder='French')
31
+ ```