File size: 3,425 Bytes
b4524ef
 
 
 
 
 
 
 
 
 
 
 
 
 
c133e4e
b4524ef
c133e4e
b4524ef
 
 
 
 
 
 
 
 
 
 
 
402fa23
b4524ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
language: en
license: mit
library_name: llama.cpp
tags:
  - llama.cpp
  - gguf
  - quantized
  - mimo
  - reasoning
base_model: XiaomiMiMo/MiMo-7B-RL
base_model_relation: quantized
---

# MiMo-7B-RL (GGUF)

This is a GGUF quantized version of [XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL), optimized for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines. The model has been converted from the original SafeTensors format to GGUF.

## Model Description

MiMo-7B-RL is a powerful 7B parameter language model developed by Xiaomi, specifically designed for enhanced reasoning capabilities in both mathematics and code. The original model matches the performance of OpenAI's o1-mini in many benchmarks.

### Model Details

- **Original Model**: MiMo-7B-RL by Xiaomi
- **Parameters**: 7 billion
- **Context Length**: 32,768 tokens
- **Architecture**: Modified transformer with 36 layers, 32 attention heads
- **Original Format**: SafeTensors
- **Converted Format**: GGUF
- **License**: MIT

Key features of the original model:

- Trained using a specialized pre-training strategy focused on reasoning tasks
- Fine-tuned with reinforcement learning on 130K mathematics and code problems
- Demonstrates superior performance in both mathematical reasoning and coding tasks
- Matches performance of much larger models in reasoning capabilities

## Usage

### With Ollama

```bash
ollama run mimo-7b-rl-q8
```

### With LM Studio

1. Load the model through the LM Studio interface
2. Select the GGUF file
3. Configure your desired settings
4. Start chatting!

### With llama.cpp

```bash
./main -m mimo-7b-rl-q8.gguf -n 4096
```

## Performance

The original model demonstrates impressive performance across various benchmarks:

| Benchmark                 | Score |
| ------------------------- | :---: |
| MATH-500 (Pass@1)         | 95.8% |
| AIME 2024 (Pass@1)        | 68.2% |
| AIME 2025 (Pass@1)        | 55.4% |
| LiveCodeBench v5 (Pass@1) | 57.8% |
| LiveCodeBench v6 (Pass@1) | 49.3% |

_Note: Performance metrics are from the original model. The GGUF conversion may show slightly different results due to quantization._

## Limitations and Biases

The model inherits any limitations and biases present in the original MiMo-7B-RL model. Additionally:

- Q8 quantization may result in slightly reduced performance compared to the original model
- The model requires careful prompt engineering for optimal results in reasoning tasks
- Performance may vary depending on the specific GGUF inference implementation used

## Training Details

The model was trained by Xiaomi using:

- Pre-training on approximately 25 trillion tokens
- Three-stage data mixture strategy
- Multiple-Token Prediction as an additional training objective
- RL fine-tuning on 130K mathematics and code problems

For detailed training information, please refer to the [original model card](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL).

## Citation

If you use this model, please cite the original work:

```bibtex
@misc{xiaomi2025mimo,
      title={MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining},
      author={{Xiaomi LLM-Core Team}},
      year={2025},
      primaryClass={cs.CL},
      url={https://github.com/XiaomiMiMo/MiMo},
}
```

## Acknowledgments

- Original model development by Xiaomi LLM-Core Team
- GGUF conversion by Frank Denis