Qwen2.5-Math-7B • Fine-tuned for Mathematical Reasoning

Base Model Training Dataset Qwen2.5-Math Codebase

Qwen2.5-Math-7B is a fine-tuned version of Qwen2.5-Math-7B, specifically optimized for mathematical reasoning through Direct Preference Optimization (DPO) on the Math-Step-DPO-10K dataset. This model specializes in generating step-by-step solutions to mathematical problems across various domains including algebra, calculus, and geometry.

🧮 Training Details

  • Base Model: Qwen/Qwen2.5-Math-7B
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Framework: mlx_lm.lora (Apple MLX)
  • Hardware: Apple Silicon Mac
  • Dataset: Math-Step-DPO-10K
  • Objective: Enhance step-by-step mathematical reasoning through parameter-efficient adaptation
  • Parameters:
    • optimizer: adamw
    • Training iterations: 50
    • Learning rate: 1e-5
  • LoRA Configuration:
    • Rank: 8
    • Alpha (scale): 10
    • Dropout: 0

💻 Usage

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("HenryShan/Qwen2.5-Math-7B-DPO-10K")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

License

Qwen2.5-Math-7B-DPO-10K is licensed under the Apache license 2.0. It is finetuned from Qwen2.5-Math-7B, under Apache 2.0.

✍️ Citation

@misc{haotian_shan_2025,
    author       = { Haotian Shan },
    title        = { Qwen2.5-Math-7B-DPO-10K (Revision e4f4bb3) },
    year         = 2025,
    url          = { https://huggingface.co/HenryShan/Qwen2.5-Math-7B-DPO-10K },
    doi          = { 10.57967/hf/5631 },
    publisher    = { Hugging Face }
}
Downloads last month
116
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for HenryShan/Qwen2.5-Math-7B-DPO-10K

Base model

Qwen/Qwen2.5-7B
Finetuned
(394)
this model

Dataset used to train HenryShan/Qwen2.5-Math-7B-DPO-10K