arxiv:2503.07365

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Published on Mar 10

· Submitted by

FanqingM on Mar 11

#3 Paper of the day

Authors:

Fanqing Meng ,

Lingxiao Du ,

,

,

Quanfeng Lu ,

,

Botian Shi ,

Wenhai Wang ,

Junjun He ,

Kaipeng Zhang ,

Ping Luo ,

,

Qiaosheng Zhang ,

Wenqi Shao

Abstract

We present MM-Eureka, a multimodal reasoning model that successfully extends large-scale rule-based reinforcement learning (RL) to multimodal reasoning. While rule-based RL has shown remarkable success in improving LLMs' reasoning abilities in text domains, its application to multimodal settings has remained challenging. Our work reproduces key characteristics of text-based RL systems like DeepSeek-R1 in the multimodal space, including steady increases in accuracy reward and response length, and the emergence of reflection behaviors. We demonstrate that both instruction-tuned and pre-trained models can develop strong multimodal reasoning capabilities through rule-based RL without supervised fine-tuning, showing superior data efficiency compared to alternative approaches. We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at https://github.com/ModalMinds/MM-EUREKA

View arXiv page View PDF Add to collection

Community

Paper author Paper submitter 1 day ago

The R1-Zero moment of multimodal mathematical reasoning is reproduced for the first time, as well as the large scale training

about 19 hours ago

•

edited about 19 hours ago

Did you also run RL on InternVL2.5-38B-Instruct ? If so, I am curious about its benchmark results

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.07365 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.07365 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.07365 in a Space README.md to link it from this page.

Collections including this paper 7