LichengLiu03 commited on
Commit
91662d5
·
verified ·
1 Parent(s): 5fd7862

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -16,6 +16,8 @@ pipeline_tag: text-generation
16
 
17
  This model is based on **Qwen2.5-3B-Instruct** and trained with **PPO (Proximal Policy Optimization)** on the **MetaMathQA** dataset for mathematical reasoning.
18
 
 
 
19
  ## Model Info
20
 
21
  - **Base model**: Qwen/Qwen2.5-3B-Instruct
 
16
 
17
  This model is based on **Qwen2.5-3B-Instruct** and trained with **PPO (Proximal Policy Optimization)** on the **MetaMathQA** dataset for mathematical reasoning.
18
 
19
+ Github: https://github.com/lichengliu03/unary-feedback
20
+
21
  ## Model Info
22
 
23
  - **Base model**: Qwen/Qwen2.5-3B-Instruct