Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -59,7 +59,7 @@ print(gen(
|
|
59 |
|
60 |
Discord-Micae-Hermes-3-3B is a new finetune on [NousResearch/Hermes-3-Llama-3.2-3B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B).
|
61 |
|
62 |
-
The model was trained on 17 million tokens of 250 thousand Discord STX (single turn exchanges) for 6 epochs and 5.5 million tokens of 100 thousand multi-turn chains for 6 epochs at learn rate 2e-5, finishing with both datesets combined for 1 epoch at 1e-5. We used a cosine warmup with 220 warmup steps for each phase.
|
63 |
|
64 |
## Dataset
|
65 |
|
|
|
59 |
|
60 |
Discord-Micae-Hermes-3-3B is a new finetune on [NousResearch/Hermes-3-Llama-3.2-3B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B).
|
61 |
|
62 |
+
The model was trained on 17 million tokens of 250 thousand Discord STX (single turn exchanges) for 6 epochs and 5.5 million tokens of 100 thousand multi-turn chains for 6 epochs at learn rate 2e-5, finishing with both datesets combined for 1 epoch at 1e-5. We used a cosine warmup with 220 warmup steps for each phase. The LoRA adapter was trained with alpha = 32 and r = 8.
|
63 |
|
64 |
## Dataset
|
65 |
|