Qwen2.5-Math-7B • Fine-tuned for Mathematical Reasoning

$Base Model$ $Training Dataset$

Qwen2.5-Math-7B is a fine-tuned version of Qwen2.5-Math-7B, specifically optimized for mathematical reasoning through Direct Preference Optimization (DPO) on the Math-Step-DPO-10K dataset. This model specializes in generating step-by-step solutions to mathematical problems across various domains including algebra, calculus, and geometry.

🧮 Training Details

Base Model: Qwen/Qwen2.5-Math-7B
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Framework: mlx_lm.lora (Apple MLX)
Hardware: Apple Silicon Mac
Dataset: Math-Step-DPO-10K
Objective: Enhance step-by-step mathematical reasoning through parameter-efficient adaptation
Parameters:
- optimizer: adamw
- Training iterations: 50
- Learning rate: 1e-5
LoRA Configuration:
- Rank: 8
- Alpha (scale): 10
- Dropout: 0

💻 Usage

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("HenryShan/Qwen2.5-Math-7B-DPO-10K")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

License

Qwen2.5-Math-7B-DPO-10K is licensed under the Apache license 2.0. It is finetuned from Qwen2.5-Math-7B, under Apache 2.0.

✍️ Citation

@misc{haotian_shan_2025,
    author       = { Haotian Shan },
    title        = { Qwen2.5-Math-7B-DPO-10K (Revision e4f4bb3) },
    year         = 2025,
    url          = { https://huggingface.co/HenryShan/Qwen2.5-Math-7B-DPO-10K },
    doi          = { 10.57967/hf/5631 },
    publisher    = { Hugging Face }
}

HenryShan
/

Qwen2.5-Math-7B-DPO-10K

Qwen2.5-Math-7B • Fine-tuned for Mathematical Reasoning

🧮 Training Details

💻 Usage

License

✍️ Citation

Model tree for HenryShan/Qwen2.5-Math-7B-DPO-10K

Dataset used to train HenryShan/Qwen2.5-Math-7B-DPO-10K