Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
---
|
4 |
+
# R1-Distill-Llama-8B-Anima10
|
5 |
+

|
6 |
+
|
7 |
+
## This model is a work in progress.
|
8 |
+
|
9 |
+
This model is the result of 10 epochs of finetuning [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) on a private corpus containing 11 megabytes of hand-selected raw text at a low learning rate using short token sequences.
|
10 |
+
|
11 |
+
The original intention was to try and influence the style of the model's thinking text but it seems to have lead to other unintended results.
|
12 |
+
|
13 |
+

|
14 |
+
|
15 |
+
It was originally trained for 3 epochs.
|
16 |
+
|
17 |
+
In testing when it was asked "What is the fastest way to get around Europe?" it fell into an endless trap of recursive (but relevant) thinking.
|
18 |
+
|
19 |
+
Also noteworthy was the slow descent of the training loss once it reached around 3.5.
|
20 |
+
|
21 |
+
In order to further explore these observations an additional 7 epochs of training was scheduled and this model is the result of that.
|
22 |
+
|
23 |
+
It was not only able to resolve the thinking loop regarding the Europe question but has broken past some of the 'hard stops' originally trained into it.
|
24 |
+
|
25 |
+
The model is currently undergoing additional training.
|