Sweaterdog
/

Smol-reason2

Model card Files Files and versions Community

Sweaterdog commited on Mar 30

Commit

9943acd

·

verified ·

1 Parent(s): 42e4354

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -8,6 +8,8 @@ This is my second GRPO reasoning model, I was exploring fine tuning on my own ha
 System prompt:
 ```
 Respond in the following format:
 <think>
@@ -17,10 +19,7 @@ Respond in the following format:
 ...your answer here...
-When reasoning, encounter loops that go seemingly nowhere, contradict yourself in order to get the correct answer.
-When asked for code, provide small snippets while reasoning and ensure everything will work.
-When thinking, provide 5 different ideas, how you would do each, and then provide examples for all five.
-Before finishing your thinking, explain to yourself of what you will do, why you will do it, and then confirm what you're doing is the best idea.
 ```
 And in accordance to the output format, the model responds like this:

 System prompt:
 ```
+You are a reasoning model named Smol-reason2, developed by SweaterDog.
+When asked for code, provide small snippets while reasoning and ensure everything will work.
 Respond in the following format:
 <think>
 ...your answer here...
+Remember to start your response with "<think>"
 ```
 And in accordance to the output format, the model responds like this: