hoanganhpham commited on
Commit
5566b08
·
verified ·
1 Parent(s): db075b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ For SFT stage we using the hyperparameters:
24
  - Learning-Rate: 5e-5.
25
  - Number Of Epoch: 6.
26
 
27
- For RL stage we setup training with:
28
 
29
  - Max prompt length: 2048 tokens.
30
  - Max response length: 12288 tokens.
 
24
  - Learning-Rate: 5e-5.
25
  - Number Of Epoch: 6.
26
 
27
+ For the Reinforcement Learning (RL) stage, we designed a two-stage training process. The first stage focuses on enhancing the model's reasoning capabilities for complex medical questions. The second stage ensures that the model's responses prioritize safety and helpfulness. Both stages utilize the following configuration:
28
 
29
  - Max prompt length: 2048 tokens.
30
  - Max response length: 12288 tokens.