Abstract
Magistral, a scalable reinforcement learning pipeline, demonstrates that RL can enhance multimodal understanding and instruction following in large language models without requiring existing RL traces.
We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a simple method to force the reasoning language of the model, and show that RL on text data alone maintains most of the initial checkpoint's capabilities. We find that RL on text maintains or improves multimodal understanding, instruction following and function calling. We present Magistral Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we open-source Magistral Small (Apache 2.0) which further includes cold-start data from Magistral Medium.
Community
Mistral's first reasoning model 🥳
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math (2025)
- Phi-4-reasoning Technical Report (2025)
- Incentivizing LLMs to Self-Verify Their Answers (2025)
- Token-Efficient RL for LLM Reasoning (2025)
- Learning to Reason Across Parallel Samples for LLM Reasoning (2025)
- Interleaved Reasoning for Large Language Models via Reinforcement Learning (2025)
- HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper