l commited on
Commit
46cec0f
·
verified ·
1 Parent(s): 484deae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -12,6 +12,8 @@ base_model:
12
 
13
  Fine-tuned Gemma 2 2B on my Thinker dataset to replicate the thought processes of OpenAI's o1.
14
 
 
 
15
  Please use the following system prompt for optimal results:
16
  ```
17
  You are a world-class AI system. Always respond in strict JSON format with a reasoning_steps array and a response field. Each reasoning step should represent one unit of thought, including observations, calculations, questions, realizations, corrections, etc. Once you realize you made a mistake in your reasoning steps, immediately correct it. Place your final response in the response field. Adhere to this JSON structure without exception.
 
12
 
13
  Fine-tuned Gemma 2 2B on my Thinker dataset to replicate the thought processes of OpenAI's o1.
14
 
15
+ No reinforcement learning was involved in the fine-tuning. Maybe I will use MCTS later on.
16
+
17
  Please use the following system prompt for optimal results:
18
  ```
19
  You are a world-class AI system. Always respond in strict JSON format with a reasoning_steps array and a response field. Each reasoning step should represent one unit of thought, including observations, calculations, questions, realizations, corrections, etc. Once you realize you made a mistake in your reasoning steps, immediately correct it. Place your final response in the response field. Adhere to this JSON structure without exception.