Model Card for HeisenbergQ-0.5B

Model Details

HeisenbergQ-0.5B is a fine-tuned version of Qwen2.5-0.5B-Instruct, optimized for quantum physics reasoning using GRPO reinforcement learning with custom reward functions. This model is trained to produce structured answers in XML format with and tags. It excels at step-by-step logical reasoning in physics-related problems.

Model Description

  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: unsloth/Qwen2.5-0.5B-Instruct
  • Fine-Tuning Method: GRPO with LoRA
  • Domain: Quantum Physics
  • Dataset: jilp00/YouToks-Instruct-Quantum-Physics-II

Uses

Direct Use

  • Primary: Solving and reasoning through quantum physics problems
  • Secondary: General scientific reasoning in math & physics
  • Not for: General-purpose conversation (model is specialized)

Bias, Risks, and Limitations

  • Trained only on ~1K samples (domain-specific)
  • May hallucinate outside physics domain
  • Small 0.5B parameter size = lightweight, but reasoning depth is limited compared to larger models

How to Get Started with the Model

Use the code below to get started with the model.

from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

login(token="")

tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Qwen2.5-0.5B-Instruct",
    device_map={"": 0}, token=""
)

model = PeftModel.from_pretrained(base_model,"khazarai/HeisenbergQ-0.5B-RL")

system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

question = """
What is the significance of setting mass equal to 1 in a quantum dynamical system, and how does it impact the formulation of the Hamiltonian and the operators?
"""

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 1800,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

For pipeline:

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "khazarai/HeisenbergQ-0.5B-RL")

question = """
What is the significance of setting mass equal to 1 in a quantum dynamical system, and how does it impact the formulation of the Hamiltonian and the operators?
"""

system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
    {"role" : "system", "content" : system},
    {"role": "user", "content": question}
]
pipe(messages)

Training Details

Training Procedure

  • Training Method: GRPO (Grouped Relative Policy Optimization)
  • Reward Models: Reasoning Quality Reward: Encourages logical markers & coherent chains of thought
  • Token Count Reward: Prevents under- or over-explaining
  • XML Reward: Enforces / format
  • Soft Format Reward: Ensures graceful handling of edge cases
  • Steps: ~390 steps, 3 epochs
  • Batch Size: 16 (with 2 generations per prompt)

Framework versions

  • PEFT 0.15.2
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for khazarai/HeisenbergQ-0.5B-RL

Base model

Qwen/Qwen2.5-0.5B
Adapter
(236)
this model

Dataset used to train khazarai/HeisenbergQ-0.5B-RL

Collection including khazarai/HeisenbergQ-0.5B-RL