Qwen2.5-Math-7B • Fine-tuned for Mathematical Reasoning
Qwen2.5-Math-7B is a fine-tuned version of Qwen2.5-Math-7B, specifically optimized for mathematical reasoning through Direct Preference Optimization (DPO) on the Math-Step-DPO-10K dataset. This model specializes in generating step-by-step solutions to mathematical problems across various domains including algebra, calculus, and geometry.
🧮 Training Details
- Base Model: Qwen/Qwen2.5-Math-7B
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Framework: mlx_lm.lora (Apple MLX)
- Hardware: Apple Silicon Mac
- Dataset: Math-Step-DPO-10K
- Objective: Enhance step-by-step mathematical reasoning through parameter-efficient adaptation
- Parameters:
- optimizer: adamw
- Training iterations: 50
- Learning rate: 1e-5
- LoRA Configuration:
- Rank: 8
- Alpha (scale): 10
- Dropout: 0
💻 Usage
# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm
# Generate text with mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("HenryShan/Qwen2.5-Math-7B-DPO-10K")
prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
text = generate(model, tokenizer, prompt=prompt, verbose=True)
License
Qwen2.5-Math-7B-DPO-10K is licensed under the Apache license 2.0. It is finetuned from Qwen2.5-Math-7B, under Apache 2.0.
✍️ Citation
@misc{haotian_shan_2025,
author = { Haotian Shan },
title = { Qwen2.5-Math-7B-DPO-10K (Revision e4f4bb3) },
year = 2025,
url = { https://huggingface.co/HenryShan/Qwen2.5-Math-7B-DPO-10K },
doi = { 10.57967/hf/5631 },
publisher = { Hugging Face }
}
- Downloads last month
- 116
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
2
Ask for provider support