GRPO
Collection
Group Relative Policy Optimization
•
2 items
•
Updated
•
1
HeisenbergQ-0.5B is a fine-tuned version of Qwen2.5-0.5B-Instruct, optimized for quantum physics reasoning using GRPO reinforcement learning with custom reward functions. This model is trained to produce structured answers in XML format with and tags. It excels at step-by-step logical reasoning in physics-related problems.
Use the code below to get started with the model.
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
login(token="")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen2.5-0.5B-Instruct",
device_map={"": 0}, token=""
)
model = PeftModel.from_pretrained(base_model,"khazarai/HeisenbergQ-0.5B-RL")
system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
question = """
What is the significance of setting mass equal to 1 in a quantum dynamical system, and how does it impact the formulation of the Hamiltonian and the operators?
"""
messages = [
{"role": "system", "content": system},
{"role": "user", "content": question}
]
text = tokenizer.apply_chat_template(
messages,
tokenize = False,
add_generation_prompt = True,
)
from transformers import TextStreamer
_ = model.generate(
**tokenizer(text, return_tensors = "pt").to("cuda"),
max_new_tokens = 1800,
streamer = TextStreamer(tokenizer, skip_prompt = True),
)
For pipeline:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "khazarai/HeisenbergQ-0.5B-RL")
question = """
What is the significance of setting mass equal to 1 in a quantum dynamical system, and how does it impact the formulation of the Hamiltonian and the operators?
"""
system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
{"role" : "system", "content" : system},
{"role": "user", "content": question}
]
pipe(messages)
Base model
Qwen/Qwen2.5-0.5B