Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1

This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for mathematical reasoning tasks. It has been trained using TRL with QLoRA to maintain high performance while keeping the parameter count low.

Performance

This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:

Model	GSM8K Accuracy	Improvement
Base Qwen3-0.6B	20.17%	-
Fine-tuned Qwen3-0.6B	43.06%	+113%

Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)

# Solve a math problem
question = "If 8x + 5 = 3x - 15, what is the value of x?"
messages = [
    {"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
    {"role": "user", "content": question}
]

# Format messages using the chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=512,
    temperature=0.2
)

# Decode and print response
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Example Output

To solve this equation, I need to isolate the variable x.

Given equation: 8x + 5 = 3x - 15

Step 1: Subtract 3x from both sides to get all x terms on the left side.
8x + 5 - 3x = 3x - 15 - 3x
5x + 5 = -15

Step 2: Subtract 5 from both sides.
5x + 5 - 5 = -15 - 5
5x = -20

Step 3: Divide both sides by 5 to isolate x.
5x/5 = -20/5
x = -4

Therefore, the value of x is -4.

Training procedure

This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.

Training configuration:

QLoRA with rank 16
1 epochs
Learning rate: 2.0e-4
Batch size: 8 (effective batch size with gradient accumulation: 16)
BF16 precision

Code and Reproducibility

The code for this project is available on GitHub: https://github.com/tyfeng1997/qwen3-finetune

The repository includes scripts for:

Data preparation
Training with QLoRA
Merging weights
Evaluation on math benchmarks
Deployment with VLLM

Framework versions

TRL: 0.18.0.dev0
Transformers: 4.52.0.dev0
Pytorch: 2.6.0
Datasets: 3.5.1
Tokenizers: 0.21.1

Usage and Limitations

This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

If you use this model in your research, please cite:

@misc{qwen3-0.6B-math,
    author = {Feng, Bo},
    title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
    year = {2025},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
}

tyfeng1997
/

Qwen3-0.6B-math-orca-qlora-10k-ep1