Update README.md
Browse files
README.md
CHANGED
@@ -13,10 +13,12 @@ A not-so-state-of-the-art 60M parameter transformer model.
|
|
13 |
Uses the olmo default architecture.
|
14 |
|
15 |
### Specs
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
|
|
|
|
20 |
|
21 |
### Training Data
|
22 |
Pretraining:
|
|
|
13 |
Uses the olmo default architecture.
|
14 |
|
15 |
### Specs
|
16 |
+
Heads: 8
|
17 |
+
Layers: 8
|
18 |
+
Dimension model: 512
|
19 |
+
Dimension mlp: 4096
|
20 |
+
|
21 |
+
eval/v3-small-c4_en-validation/Perplexity: 40.33
|
22 |
|
23 |
### Training Data
|
24 |
Pretraining:
|