Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for mathematical reasoning tasks. It has been trained using TRL with QLoRA to maintain high performance while keeping the parameter count low.
Performance
This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:
Model | GSM8K Accuracy | Improvement |
---|---|---|
Base Qwen3-0.6B | 20.17% | - |
Fine-tuned Qwen3-0.6B | 43.06% | +113% |
Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
# Solve a math problem
question = "If 8x + 5 = 3x - 15, what is the value of x?"
messages = [
{"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
{"role": "user", "content": question}
]
# Format messages using the chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
inputs["input_ids"],
max_new_tokens=512,
temperature=0.2
)
# Decode and print response
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
Example Output
To solve this equation, I need to isolate the variable x.
Given equation: 8x + 5 = 3x - 15
Step 1: Subtract 3x from both sides to get all x terms on the left side.
8x + 5 - 3x = 3x - 15 - 3x
5x + 5 = -15
Step 2: Subtract 5 from both sides.
5x + 5 - 5 = -15 - 5
5x = -20
Step 3: Divide both sides by 5 to isolate x.
5x/5 = -20/5
x = -4
Therefore, the value of x is -4.
Training procedure
This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.
Training configuration:
- QLoRA with rank 16
- 1 epochs
- Learning rate: 2.0e-4
- Batch size: 8 (effective batch size with gradient accumulation: 16)
- BF16 precision
Code and Reproducibility
The code for this project is available on GitHub: https://github.com/tyfeng1997/qwen3-finetune
The repository includes scripts for:
- Data preparation
- Training with QLoRA
- Merging weights
- Evaluation on math benchmarks
- Deployment with VLLM
Framework versions
- TRL: 0.18.0.dev0
- Transformers: 4.52.0.dev0
- Pytorch: 2.6.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1
Usage and Limitations
This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
If you use this model in your research, please cite:
@misc{qwen3-0.6B-math,
author = {Feng, Bo},
title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
}
- Downloads last month
- 25