Papers
arxiv:2508.10433

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Published on Aug 14
Ā· Submitted by RichardQRQ on Aug 15
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

We-Math 2.0 enhances MLLMs' mathematical reasoning through a structured knowledge system, model-centric data space modeling, and reinforcement learning, demonstrating competitive performance on benchmarks.

AI-generated summary

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this paper, we introduce We-Math 2.0, a unified system that integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to comprehensively enhance the mathematical reasoning abilities of MLLMs. The key contributions of We-Math 2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level hierarchical system encompassing 491 knowledge points and 1,819 fundamental principles. (2) MathBook-Standard & Pro: We develop MathBook-Standard, a dataset that ensures broad conceptual coverage and flexibility through dual expansion. Additionally, we define a three-dimensional difficulty space and generate 7 progressive variants per problem to build MathBook-Pro, a challenging dataset for robust training. (3) MathBook-RL: We propose a two-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns the model with knowledge-oriented chain-of-thought reasoning; and (ii) Progressive Alignment RL, leveraging average-reward learning and dynamic data scheduling to achieve progressive alignment across difficulty levels. (4) MathBookEval: We introduce a comprehensive benchmark covering all 491 knowledge points with diverse reasoning step distributions. Experimental results show that MathBook-RL performs competitively with existing baselines on four widely-used benchmarks and achieves strong results on MathBookEval, suggesting promising generalization in mathematical reasoning.

Community

Paper submitter
•
edited 1 day ago

šŸš€ Webpage: https://we-math2.github.io/
šŸ’» Github: https://github.com/We-Math/We-Math2.0

šŸ’” Overview

We-Math 2.0 is a unified system designed to comprehensively enhance the mathematical reasoning capabilities of Multimodal Large Language Models (MLLMs). It integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to achieve both broad conceptual coverage and robust reasoning performance across varying difficulty levels.

The key contributions of We-Math 2.0 are fourfold:

  • MathBook Knowledge System — A five-level hierarchical structure encompassing 491 knowledge points and 1,819 fundamental principles.
  • MathBook-Standard & MathBook-Pro — MathBook-Standard ensures wide conceptual coverage and flexibility via dual expansion, while MathBook-Pro defines a three-dimensional difficulty space and generates 7 progressive variants per problem for robust training.
  • MathBook-RL — A two-stage RL framework comprising Cold-Start Fine-tuning for knowledge-oriented chain-of-thought alignment, and Progressive Alignment RL with average-reward learning and dynamic data scheduling for gradual alignment across difficulty levels.
  • MathBookEval — A comprehensive benchmark covering all 491 knowledge points with diverse reasoning step distributions.

šŸƒšŸƒšŸƒAll images in our training set are manually and meticulously crafted using GeoGebra software, ensuring they are newly created, precise, and surpass common Python-based rendering methods in spatial geometric rigor and complexity

Paper submitter

WechatIMG152.jpg

You can visit our Project Page to view the complete version of the content below!

MathBook Knowledge System

old-ks.png

Mathbook-Standard

ms.png

MathBook-Pro

mp.png

Paper submitter

Reflections on Building We-Math 2.0

Inspired by MathVista, which revealed that model performance on high school mathematics problems was surprisingly close to that on elementary-level ones, and by our We-Math benchmark, where models could solve complex problems yet often failed on their decomposed sub-problems, we became convinced that a structured mathematical knowledge system is an inevitable step toward advancing multimodal mathematical reasoning—perhaps not the most critical factor today, but a foundational one for the future.

We therefore set out to construct a complete knowledge hierarchy and create training data precisely mapped to each knowledge point. Initially, we referenced extensive materials to design the system, but soon realized that collected problems rarely aligned cleanly with specific knowledge points, with most exhibiting non-orthogonal coverage.

To address this, we chose an unconventional path: abandoning all collected problems in favor of authoring every question and diagram manually. Leveraging GeoGebra to ensure both the quality and the spatial complexity required for solid geometry, we embarked on a long and demanding construction process. Along the way, we observed that reinforcement learning showed promise for enhancing visual reasoning, inspiring us to define a three-dimensional difficulty space and generate progressively harder variants for selected seed problems—culminating in MathBook-Pro.

While we experimented with various automated generation techniques, none met our quality standards, so we committed fully to manual creation.

After nearly a year of startup-level intensity, we release the first version of We-Math 2.0. We hope that this work will serve as a meaningful contribution to both the research community and the broader AI for Education field. All .ggb source files will be fully open-sourced, so that teachers and educators can adapt them freely and bring them into their classrooms.

We hope you enjoy it!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.10433 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.10433 in a Space README.md to link it from this page.

Collections including this paper 2