Ach0 commited on
Commit
4c6b181
·
verified ·
1 Parent(s): 63912d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md CHANGED
@@ -29,6 +29,79 @@ This approach ensures:
29
 
30
  ---
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ## 📊 GCPO Improves Reasoning Performance
33
 
34
  GCPO consistently outperforms DAPO.
 
29
 
30
  ---
31
 
32
+ ## 🛠️ Model Use
33
+
34
+ ### ✅ Use with Hugging Face Transformers
35
+
36
+ ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+
39
+ model_name = "Ach0/GCPO-R1-1.5B"
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
42
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto", trust_remote_code=True)
43
+
44
+ question = """
45
+ Solve the following math problem efficiently and clearly. The last line of your response should be of the following format: 'Therefore, the final answer is: $\\boxed{{ANSWER}}$. I hope it is correct' (without quotes) where ANSWER is just the final number or expression that solves the problem. Think step by step before answering.
46
+
47
+ Point $B$ is on $\\overline{AC}$ with $AB = 9$ and $BC = 21.$ Point $D$ is not on $\\overline{AC}$ so that $AD = CD,$ and $AD$ and $BD$ are integers. Let $s$ be the sum of all possible perimeters of $\\triangle ACD$. Find $s.$
48
+ """
49
+
50
+ messages = [
51
+ {"role": "user", "content": question}
52
+ ]
53
+
54
+ prompt = tokenizer.apply_chat_template(
55
+ messages,
56
+ tokenize=False,
57
+ add_generation_prompt=True,
58
+ )
59
+
60
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
61
+ outputs = model.generate(**inputs, max_new_tokens=8192)
62
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
63
+ ```
64
+
65
+ ### ✅ Use with vLLM(fast inference)
66
+
67
+ ```python
68
+ from vllm import LLM, SamplingParams
69
+ from transformers import AutoTokenizer
70
+
71
+ model_name = "Ach0/GCPO-R1-1.5B"
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
74
+ llm = LLM(model=model_name, trust_remote_code=True)
75
+
76
+ sampling_params = SamplingParams(
77
+ temperature=0.7,
78
+ top_p=0.8,
79
+ top_k=20,
80
+ max_tokens=8192
81
+ )
82
+
83
+ question = """
84
+ Solve the following math problem efficiently and clearly. The last line of your response should be of the following format: 'Therefore, the final answer is: $\\boxed{{ANSWER}}$. I hope it is correct' (without quotes) where ANSWER is just the final number or expression that solves the problem. Think step by step before answering.
85
+
86
+ Point $B$ is on $\\overline{AC}$ with $AB = 9$ and $BC = 21.$ Point $D$ is not on $\\overline{AC}$ so that $AD = CD,$ and $AD$ and $BD$ are integers. Let $s$ be the sum of all possible perimeters of $\\triangle ACD$. Find $s.$
87
+ """
88
+
89
+ messages = [
90
+ {"role": "user", "content": question}
91
+ ]
92
+
93
+ prompt = tokenizer.apply_chat_template(
94
+ messages,
95
+ tokenize=False,
96
+ add_generation_prompt=True
97
+ )
98
+
99
+ outputs = llm.generate([prompt], sampling_params)
100
+ print(outputs[0].outputs[0].text)
101
+ ```
102
+
103
+ ---
104
+
105
  ## 📊 GCPO Improves Reasoning Performance
106
 
107
  GCPO consistently outperforms DAPO.