Safetensors
codebyzeb commited on
Commit
668b32c
·
verified ·
1 Parent(s): 9e9685b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - phonemetransformers/IPA-CHILDES
4
+ language:
5
+ - en
6
+ - eu
7
+ - zh
8
+ - da
9
+ - nl
10
+ - hr
11
+ - es
12
+ - et
13
+ - fa
14
+ - fr
15
+ - de
16
+ - hu
17
+ - is
18
+ - id
19
+ - ga
20
+ - it
21
+ - ja
22
+ - ko
23
+ - pt
24
+ - pl
25
+ - qu
26
+ - ro
27
+ - sr
28
+ - sv
29
+ - tr
30
+ - cy
31
+ - 'no'
32
+ ---
33
+
34
+ # IPA CHILDES Models: Tiny
35
+
36
+ Phoneme-based GPT-2 models trained on all 31 sections of the [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) dataset for the paper [BabyLM's First Words: Word Segmentation as a Phonological Probing Task]().
37
+
38
+ The models have 600k non-embedding parameters and were trained on 100k tokens of their language. They were evaluated for phonological knowledge using the *word segmentation* task. Check out the paper for more details. Training and analysis scripts can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
39
+
40
+ To load a model:
41
+ ```python
42
+ from transformers import AutoModel
43
+ farsi_model = AutoModel.from_pretrained('phonemetransformers/ipa-childes-models-tiny', subfolder='Farsi')
44
+ ```