Update README.md
Browse files
README.md
CHANGED
@@ -164,7 +164,7 @@ Core model results for OLMo 2 32B are found below.
|
|
164 |
|-------------------|------------|------------|------------|
|
165 |
| Pretraining Stage 1 | 6 trillion tokens<br>(1.5 epoch) | 5 trillion tokens<br>(1.2 epochs) | 4 trillion tokens<br>(1 epoch) |
|
166 |
| Pretraining Stage 2 | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 50B tokens (3 runs)<br>*merged* |
|
167 |
-
| Post-training | SFT + DPO +
|
168 |
|
169 |
#### Stage 1: Initial Pretraining
|
170 |
- Dataset: [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
|
|
|
164 |
|-------------------|------------|------------|------------|
|
165 |
| Pretraining Stage 1 | 6 trillion tokens<br>(1.5 epoch) | 5 trillion tokens<br>(1.2 epochs) | 4 trillion tokens<br>(1 epoch) |
|
166 |
| Pretraining Stage 2 | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 50B tokens (3 runs)<br>*merged* |
|
167 |
+
| Post-training | SFT + DPO + GRPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) |
|
168 |
|
169 |
#### Stage 1: Initial Pretraining
|
170 |
- Dataset: [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
|