Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1

This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for mathematical reasoning tasks. It has been trained using TRL with QLoRA to maintain high performance while keeping the parameter count low.

Performance

This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:

Model GSM8K Accuracy Improvement
Base Qwen3-0.6B 20.17% -
Fine-tuned Qwen3-0.6B 43.06% +113%

Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)

# Solve a math problem
question = "If 8x + 5 = 3x - 15, what is the value of x?"
messages = [
    {"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
    {"role": "user", "content": question}
]

# Format messages using the chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=512,
    temperature=0.2
)

# Decode and print response
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Example Output

To solve this equation, I need to isolate the variable x.

Given equation: 8x + 5 = 3x - 15

Step 1: Subtract 3x from both sides to get all x terms on the left side.
8x + 5 - 3x = 3x - 15 - 3x
5x + 5 = -15

Step 2: Subtract 5 from both sides.
5x + 5 - 5 = -15 - 5
5x = -20

Step 3: Divide both sides by 5 to isolate x.
5x/5 = -20/5
x = -4

Therefore, the value of x is -4.

Training procedure

This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.

Training configuration:

  • QLoRA with rank 16
  • 1 epochs
  • Learning rate: 2.0e-4
  • Batch size: 8 (effective batch size with gradient accumulation: 16)
  • BF16 precision

Visualize in Weights & Biases

Code and Reproducibility

The code for this project is available on GitHub: https://github.com/tyfeng1997/qwen3-finetune

The repository includes scripts for:

  • Data preparation
  • Training with QLoRA
  • Merging weights
  • Evaluation on math benchmarks
  • Deployment with VLLM

Framework versions

  • TRL: 0.18.0.dev0
  • Transformers: 4.52.0.dev0
  • Pytorch: 2.6.0
  • Datasets: 3.5.1
  • Tokenizers: 0.21.1

Usage and Limitations

This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

If you use this model in your research, please cite:

@misc{qwen3-0.6B-math,
    author = {Feng, Bo},
    title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
    year = {2025},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
}
Downloads last month
25
Safetensors
Model size
752M params
Tensor type
BF16
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(37)
this model