MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era
Today, MiniMax has released and open-sourced the all-new MiniMax-01 series of models, which includes two models: the foundational language model MiniMax-Text-01 and the visual multi-modal model MiniMax-VL-01.
Report link: https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf
Innovative Lightning Attention Architecture, with Top-tier Model Performance
In the MiniMax-01 series, we have made a bold innovation: for the first time on a large scale, we have implemented a novel Lightning Attention mechanism, offering a new alternative to the traditional Transformer architecture. This model boasts a staggering 456 billion parameters, with 45.9 billion parameters activated per inference. Its comprehensive performance is on par with the leading models globally, while it efficiently handles the world's longest context length of up to 4 million tokens—20 to 32 times more than other leading models.
Ultra-Long Context, Spearheading the AI Agent Era
We believe that 2025 will be a year of rapid development for AI Agents. Whether it's the need for sustained memory in single-Agent systems or the extensive inter-Agent communication in multi-Agent systems, increasingly longer contexts are essential. With this model, we have taken the first step and aim to use this architecture to continually build the foundational capabilities required for complex Agents.
Ultimate Value for Continuous Innovation
Thanks to our architectural innovations, efficiency optimizations, integrated cluster training and inference design, and the extensive reuse of concurrent computing power within our infrastructure, we are able to offer text and multi-modal understanding APIs at the industry's most competitive price points. Our standard pricing is USD $0.2 per million input tokens and USD $1.1 per million output tokens. We invite everyone to experience and use our services on the MiniMax Open Platform.
MiniMax API platform:https://www.minimaxi.com/en/platform
MiniMax-01 series models are open-sourced and will continue to be updated regularly. It can be accessed via https://github.com/MiniMax-AI
Based on mainstream benchmarks for text and multi-modal understanding, as shown in the figure below, we have matched the performance of the most advanced models. As illustrated in figure (c), MiniMax-Text-01 exhibits the least performance degradation as the input length increases.
Thanks to our architectural innovations, our model demonstrates exceptional efficiency when processing long inputs, approaching linear complexity. The comparison with other top-tier global models is as follows:
The structure we use is as follows: within every 8 layers, 7 are based on Lightning Attention with linear attention, and 1 layer employs traditional SoftMax attention.
This marks the first time in the industry that the linear attention mechanism has been scaled to the level of commercial-grade models. We have undertaken a comprehensive consideration from the perspectives of Scaling Law, integration with Mixture of Experts (MoE), structural design, training optimization, and inference optimization. As it is the first time in the industry to undertake such a large-scale model primarily reliant on the linear attention mechanism, we have virtually rebuilt our training and inference systems. This includes more efficient MoE All-to-all communication optimization, optimization for longer sequences, and the efficient kernel implementation of the linear attention layer at the inference level.
On most academic benchmarks, we have achieved results on par with the top-tier models internationally:
On the benchmarks for long-context evaluation, we are significantly ahead:
Achieved 100% accuracy in the 4-million-token vanilla Needle-In-A-Haystack retrieval task:
In addition to academic datasets, we have constructed a test set based on real-world data within our AI assistant scenario. In this scenario, the MiniMax-Text-01 model demonstrates a significant lead, with specific comparisons as follows:
In the multi-modal understanding test set, the MiniMax-VL-01 model also demonstrates a strong lead:
To facilitate more research by developers, we have open-sourced the complete weights of both models at https://github.com/MiniMax-AI. We are committed to uploading subsequent updates for this series of models, including enhancements related to code and multi-modal capabilities, as soon as they are available.
We chose to open-source for two main reasons: Firstly, we believe this work can inspire more research and applications in the field of long-context understanding, thereby accelerating the arrival of the AI Agent era. Secondly, open-sourcing will motivate us to pursue more innovations and ensure higher quality in our ongoing model development efforts.
In addition to open-sourcing and offering highly cost-effective APIs, our models can be accessed and used on Hailuo AI ( hailuo.ai).
For any technical suggestions or collaboration inquiries, please feel free to contact us via email at [email protected].
Connect With Us
Twitter: https://x.com/Hailuo_AI
Instagram: https://www.instagram.com/hailuoai_official/
YouTube: https://www.youtube.com/@Hailuoai_MiniMax
TikTok: https://www.tiktok.com/@hailuoai_official
Discord: https://discord.gg/hailuoai
Hugging Face:https://huggingface.co/MiniMaxAI