Qwen 2.5 0.5B – Calculator Agent

This is a fine-tuned version of Qwen 2.5 0.5B Instruct trained to use a calculator tool through multi-turn reinforcement learning with GRPO.

A much more performant 3B model was also trained and can be found here.

This Github repo shows in depth training run process details

🔧 Model Description

The Qwen 2.5 0.5B model was adapted to interface with a recursive calculator environment that supports addition, subtraction, multiplication, and division. The agent generates structured tool calls in XML and YAML format, which are then executed by the calculator. After receiving the computed result from the tool, it formulates a final human-readable response.

✅ Key Achievements

Training Method: GRPO, using a hybrid reward signal combining LLM-as-a-judge feedback and programmatic verification.
Evaluation Accuracy:
- Before RL: 0.6%
- After RL: 34%
- Absolute Gain: +33.4 pts
Training Cost: ~~$18 (~~£13.47) on 8x RTX6000 Ada GPUs
Total Training Time: ~3 hours

🧪 Evaluation Dataset

The evaluation dataset consists of synthetically generated arithmetic problems designed to be difficult for humans to solve without a calculator. Questions include nested operations and real-world phrasing diversity.

Download the eval dataset

🛠️ Usage Instructions

Requirements

Transformers or vLLM for inference
Flash Attention recommended for speed
For training/RL: see full setup in GitHub repo

Example Input:

What's the sum of 987 times 654, and 987 divided by the total of 321 and 11?

Expected Output:

<calculator>
operation: add
operands:
  - operation: multiply
    operands:
      - 987
      - 654
  - operation: divide
    operands:
      - 987
      - operation: add
        operands:
          - 321
          - 11
</calculator>

This output must be passed to the environment to be parsed & calculated. Example in python here

The output from the environment should be provided to model as:

<output>
{tool output}
</output>

Then the model will generate it's final respoonse:

The result of the calculation is 645,500.97

📬 License and Attribution

Base model: Qwen 2.5 0.5B Instruct
Fine-tuned by: Dan Austin
Repository: GitHub Project

🧠 Training Framework Acknowledgement

This model was trained using parts of the Verifiers framework for structured reinforcement learning. If you use this model or build upon this work, please consider citing:

@article{brown2025verifiers,
  title={Verifiers: Reinforcement Learning with LLMs in Verifiable Environments},
  author={Brown, William},
  year={2025}
}

Dan-AiTuning
/

calculator_agent_qwen2.5_0.5b