|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
# R1-Distill-Llama-8B-Anima10 |
|
 |
|
|
|
## This model is a work in progress. |
|
|
|
This model is the result of 10 epochs of finetuning [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) on a private corpus containing 11 megabytes of hand-selected raw text at a low learning rate using short token sequences. |
|
|
|
The original intention was to try and influence the style of the model's thinking text but it seems to have lead to other unintended results. |
|
|
|
 |
|
|
|
It was originally trained for 3 epochs. |
|
|
|
In testing when it was asked "What is the fastest way to get around Europe?" it fell into an endless trap of recursive (but relevant) thinking. |
|
|
|
Also noteworthy was the slow descent of the training loss once it reached around 3.5. |
|
|
|
In order to further explore these observations an additional 7 epochs of training was scheduled and this model is the result of that. |
|
|
|
It was not only able to resolve the thinking loop regarding the Europe question but has broken past some of the 'hard stops' originally trained into it. |
|
|
|
The model is currently undergoing additional training. |