Intelligent-Internet
/

II-Medical-8B-1706

Text Generation

text-generation-inference

Model card Files Files and versions

hoanganhpham commited on 2 days ago

Commit

5566b08

·

verified ·

1 Parent(s): db075b4

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ For SFT stage we using the hyperparameters:
 - Learning-Rate: 5e-5.
 - Number Of Epoch: 6.
-For RL stage we setup training with:
 - Max prompt length: 2048 tokens.
 - Max response length: 12288 tokens.

 - Learning-Rate: 5e-5.
 - Number Of Epoch: 6.
+For the Reinforcement Learning (RL) stage, we designed a two-stage training process. The first stage focuses on enhancing the model's reasoning capabilities for complex medical questions. The second stage ensures that the model's responses prioritize safety and helpfulness. Both stages utilize the following configuration:
 - Max prompt length: 2048 tokens.
 - Max response length: 12288 tokens.