alibaba-pai
/

DistilQwen2.5-R1-32B

Safetensors

qwen2

Model card Files Files and versions Community

AndrewYan commited on Mar 18

Commit

6019b34

verified ·

1 Parent(s): acd3b97

Update README.md

Browse files

Files changed (1) hide show

README.md +88 -3

README.md CHANGED Viewed

@@ -1,3 +1,88 @@
----
-license: apache-2.0
----

+## 📖 Introduction
+# DistilQwen2.5-R1 Series: Advanced Reasoning Models
+## Overview
+As large language models (LLMs) evolve toward deep reasoning capabilities, deploying them in resource-constrained environments (e.g., mobile devices, edge computing) remains challenging. The DistilQwen2.5-R1 series addresses this by transferring reasoning capabilities from ultra-large models (e.g., DeepSeek-R1) to compact models through innovative distillation techniques, achieving high performance while reducing computational costs.
+## Key Innovations
+### 1. Cognitive Trajectory Adaptation Framework
+- **Challenge**: Discrepancies in reasoning paths between large and small models (e.g., small models struggle to comprehend large models' high-level problem-solving logic)
+- **Solutions**:
+  - **Phase 1: CoT Data Optimization**
+    - Difficulty grading of large model reasoning chains (simple/medium/hard) via LLM-as-a-Judge
+    - Adaptive adjustments: Expand simple chains and simplify complex chains to create medium-difficulty datasets digestible by small models
+  - **Phase 2: Preference Optimization**
+    - Generate contrastive data pairs containing correct/incorrect reasoning paths
+    - Apply DPO algorithm with tailored configurations to enhance reasoning path discrimination
+### 2. Performance Highlights
+- **DistilQwen2.5-R1-7B** outperforms comparable distilled models (e.g., OpenThinker-7B) across multiple benchmarks
+- Successfully transfers high-order reasoning patterns originally dependent on large model parameter scales
+## Technical Advantages
+- Dynamic data optimization eliminates cognitive trajectory discrepancies
+- Two-stage training balances reasoning accuracy and computational efficiency
+- Enables complex task reasoning in edge computing environments
+## 🚀 Quick Start
+Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+device = "cuda" # the device to load the model onto
+model = AutoModelForCausalLM.from_pretrained(
+    "alibaba-pai/DistilQwen2.5-R1-32B",
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2.5-R1-32B")
+prompt = "Give me a short introduction to large language model."
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=2048，
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+## 🔍 Evaluation
+We compared DistilQwen2.5-R1 series with leading reasoning models across four benchmarks:
+### 7B Model Comparison
+| Model                          | Training Data Size | AIME2024 | MATH-500 | GPQA Diamond | LiveCodeBench V2 |
+|--------------------------------|--------------------|----------|----------|--------------|------------------|
+| DeepSeek-R1-Distill-Qwen-7B    | 800k               | 55.5     | 92.8     | 49.1         | -                |
+| Bespoke-Stratos-7B             | 17k                | 20.0     | 82.0     | 37.8         | 36.1             |
+| OpenThinker-7B                 | 114k               | 31.3     | 83.0     | 42.4         | 39.9             |
+| **DistilQwen2.5-R1-7B**        | 105k               | 43.33    | 88.4     | 42.93        | 46.38            |
+### 32B Model Comparison
+| Model                          | Training Data Size | AIME2024 | MATH-500 | GPQA Diamond | LiveCodeBench V2 |
+|--------------------------------|--------------------|----------|----------|--------------|------------------|
+| DeepSeek-R1-Distill-Qwen-32B   | 800k               | 72.6     | 94.3     | 62.1         | -                |
+| Sky-T1-32B-Preview             | 17k                | 43.3     | 86.4     | 56.8         | -                |
+| OpenThinker-32B                | 114k               | 66.0     | 90.6     | 61.6         | 68.9             |
+| **DistilQwen2.5-R1-32B**       | 105k               | 70.0     | 93.8     | 62.12        | 65.95            |
+Key highlights:
+- DistilQwen2.5-R1 models achieve superior performance while using **6.1× less training data** than DeepSeek-R1-Distill-Qwen series
+- Maintains open-source training lineage using filtered OpenThoughts subsets
+- Leads in LiveCodeBench V2 among open-source trained models