justinj92 commited on
Commit
80fc379
·
verified ·
1 Parent(s): f3ec36a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -11
README.md CHANGED
@@ -1,37 +1,71 @@
1
  ---
2
  base_model: Qwen/Qwen2.5-1.5B-Instruct
3
  library_name: transformers
4
- model_name: Qwen-1.5B-GRPO
5
  tags:
6
  - generated_from_trainer
7
  - trl
8
  - grpo
9
  licence: license
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- # Model Card for Qwen-1.5B-GRPO
13
 
14
  This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
- ## Quick start
18
 
19
- ```python
20
- from transformers import pipeline
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="justinj92/Qwen-1.5B-GRPO", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
26
- ```
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/justinjoy-5/huggingface/runs/fj1ij9cn)
 
 
31
 
 
32
 
33
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
34
 
 
 
 
 
 
 
 
 
 
35
  ### Framework versions
36
 
37
  - TRL: 0.15.0.dev0
 
1
  ---
2
  base_model: Qwen/Qwen2.5-1.5B-Instruct
3
  library_name: transformers
4
+ model_name: Qwen2.5-1.5B-Thinking-v1.1
5
  tags:
6
  - generated_from_trainer
7
  - trl
8
  - grpo
9
  licence: license
10
+ datasets:
11
+ - microsoft/orca-math-word-problems-200k
12
+ # model-index:
13
+ # - name: Qwen2.5-1.5B-Thinking-v1.1
14
+ # results:
15
+ # - task:
16
+ # type: text-generation
17
+ # dataset:
18
+ # name: openai/gsm8k
19
+ # type: GradeSchoolMath8K
20
+ # metrics:
21
+ # - name: GSM8k (0-Shot)
22
+ # type: GSM8k (0-Shot)
23
+ # value: 14.4%
24
+ # - name: GSM8k (Few-Shot)
25
+ # type: GSM8k (Few-Shot)
26
+ # value: 63.31%
27
+ co2_eq_emissions:
28
+ emissions: 7100
29
+ source: "https://mlco2.github.io/impact#compute"
30
+ training_type: "GRPO"
31
+ geographical_location: "East US2"
32
+ hardware_used: "1 x H100 96GB"
33
+
34
  ---
35
 
36
+ # Model Card for Qwen2.5-1.5B-Thinking-v1.1
37
 
38
  This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).
39
  It has been trained using [TRL](https://github.com/huggingface/trl).
40
 
 
41
 
42
+ <!-- ## Evals
 
43
 
44
+ | Model | GSM8k 0-Shot | GSM8k Few-Shot |
45
+ |------------------------------------------|------------------|-------------------|
46
+ | Mistral-7B-v0.1 | 10 | 41 |
47
+ | Qwen2.5-1.5B-Thinking | 14.4 | 63.31 |
48
+ -->
49
 
50
  ## Training procedure
51
 
52
+ <img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>
53
+
54
+ <img src="https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking/resolve/main/w%26b_qwen_r1.png" width="1200" height="900"/>
55
 
56
+ Trained on 1xH100 96GB via Azure Cloud (East US2).
57
 
58
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
59
 
60
+ ### Usage Recommendations
61
+
62
+ **Recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:**
63
+
64
+ 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
65
+ 2. **For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."**
66
+ 3. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
67
+ 4. This model is not enhanced for other domains apart from Maths.
68
+
69
  ### Framework versions
70
 
71
  - TRL: 0.15.0.dev0