Update README.md
Browse files
README.md
CHANGED
@@ -96,7 +96,7 @@ We used the BabyLM 10M (Strict-small) dataset to train the model. It is composed
|
|
96 |
| Sequence Length | 128 → 512 |
|
97 |
| Batch Size (in tokens) | 16 384 |
|
98 |
| Learning Rate | 0.007 |
|
99 |
-
| Number of Steps |
|
100 |
| Warmup Ratio | 1.6% |
|
101 |
| Cooldown Ratio | 1.6% |
|
102 |
| Mask Ratio | 0.3 → 0.15 |
|
|
|
96 |
| Sequence Length | 128 → 512 |
|
97 |
| Batch Size (in tokens) | 16 384 |
|
98 |
| Learning Rate | 0.007 |
|
99 |
+
| Number of Steps | 9 914 |
|
100 |
| Warmup Ratio | 1.6% |
|
101 |
| Cooldown Ratio | 1.6% |
|
102 |
| Mask Ratio | 0.3 → 0.15 |
|