Envoid
/

R1-Distill-Llama-8B-Anima10

Model card Files Files and versions Community

Envoid commited on Feb 12

Commit

68ec764

·

verified ·

1 Parent(s): 0aebf81

Update README.md

Files changed (1) hide show

README.md +25 -3

README.md CHANGED Viewed

@@ -1,3 +1,25 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-4.0
+---
+# R1-Distill-Llama-8B-Anima10
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/646a30454c1cd18b4976a3f6/H-UpGrG7SyGPm5wltA7zo.jpeg)
+## This model is a work in progress.
+This model is the result of 10 epochs of finetuning [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) on a private corpus containing 11 megabytes of hand-selected raw text at a low learning rate using short token sequences.
+The original intention was to try and influence the style of the model's thinking text but it seems to have lead to other unintended results.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/646a30454c1cd18b4976a3f6/PUZLozH89ug9pt_iaadSu.png)
+It was originally trained for 3 epochs.
+In testing when it was asked "What is the fastest way to get around Europe?" it fell into an endless trap of recursive (but relevant) thinking.
+Also noteworthy was the slow descent of the training loss once it reached around 3.5.
+In order to further explore these observations an additional 7 epochs of training was scheduled and this model is the result of that.
+It was not only able to resolve the thinking loop regarding the Europe question but has broken past some of the 'hard stops' originally trained into it.
+The model is currently undergoing additional training.