Update README.md
Browse files
README.md
CHANGED
@@ -214,11 +214,11 @@ a:hover {text-decoration: underline;}
|
|
214 |
<div class="section-content">
|
215 |
<p>The model first went through SFT with a small synthetic dataset of 2.9 million tokens, approximately 750 conversations. Primarily RP data with small amounts of random instruct / assistant data and creative writing.</p>
|
216 |
<p>The model then went through DPO training using approx 1100 chosen examples from the SFT dataset that were of exceptional quality or showed verifiable instruction following. Rejected samples were generated using another Llama 3.3 finetune that is known for poor instruction following.</p>
|
217 |
-
<h3 class="subheading">Axolotl configs</h3>
|
218 |
-
<p>Neither are optimized for cost / performance efficiency, YMMV.</p>
|
219 |
</div>
|
220 |
</div>
|
221 |
</div>
|
|
|
|
|
222 |
<h3>SFT 1*H200</h3>
|
223 |
|
224 |
```yml
|
@@ -329,8 +329,8 @@ save_safetensors: true
|
|
329 |
# WANDB TRACKING
|
330 |
# ====================
|
331 |
wandb_project: project_name
|
332 |
-
wandb_entity: your_entity
|
333 |
-
wandb_name: your_run_name
|
334 |
|
335 |
```
|
336 |
<h3>DPO 2*H200</h3>
|
|
|
214 |
<div class="section-content">
|
215 |
<p>The model first went through SFT with a small synthetic dataset of 2.9 million tokens, approximately 750 conversations. Primarily RP data with small amounts of random instruct / assistant data and creative writing.</p>
|
216 |
<p>The model then went through DPO training using approx 1100 chosen examples from the SFT dataset that were of exceptional quality or showed verifiable instruction following. Rejected samples were generated using another Llama 3.3 finetune that is known for poor instruction following.</p>
|
|
|
|
|
217 |
</div>
|
218 |
</div>
|
219 |
</div>
|
220 |
+
<h3 class="subheading">Axolotl configs</h3>
|
221 |
+
<p>Neither are optimized for cost / performance efficiency, YMMV.</p>
|
222 |
<h3>SFT 1*H200</h3>
|
223 |
|
224 |
```yml
|
|
|
329 |
# WANDB TRACKING
|
330 |
# ====================
|
331 |
wandb_project: project_name
|
332 |
+
# wandb_entity: your_entity
|
333 |
+
# wandb_name: your_run_name
|
334 |
|
335 |
```
|
336 |
<h3>DPO 2*H200</h3>
|