p1atdev
/

qwen2.5-0.5b-grpo-math-01

Text Generation

text-generation-inference

Model card Files Files and versions

p1atdev commited on Feb 6

Commit

d56472a

·

verified ·

1 Parent(s): 72dca67

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -7,6 +7,9 @@ base_model:
 - Qwen/Qwen2.5-0.5B
 ---
 prompt format:
 ```
@@ -56,9 +59,10 @@ print(pipe(prompt)[0]["generated_text"][len(prompt):])
 ## Training information
 - Device: 1x A100 80G
 - GPU Hour: about 1 hour
-- Base model: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)
 Wandb log: https://wandb.ai/p1atdev/grpo-math-01/runs/ytv8wxll

 - Qwen/Qwen2.5-0.5B
 ---
+簡単な算数問題を解けるように GRPO で学習してみた。学習コードは下の方にあります。
+学習データは簡単な問題なのでその場で合成したものを使いました。(コード参照)
 prompt format:
 ```
 ## Training information
+- Base model: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)
 - Device: 1x A100 80G
 - GPU Hour: about 1 hour
+- Total training steps: 140 steps ([the last checkpoint](https://huggingface.co/p1atdev/qwen2.5-0.5b-grpo-math-01/blob/9ede090f5ed41d88c16ffbc56a81b0772f19679e/model.safetensors))
 Wandb log: https://wandb.ai/p1atdev/grpo-math-01/runs/ytv8wxll