Papers
arXiv:2511.06221

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Published on Nov 9
· Submitted by DenseHub on Nov 12
#2 Paper of the day
Authors:
Sen Xu ,
,
,
,
,

Abstract

VibeThinker-1.5B, a 1.5B-parameter model using the Spectrum-to-Signal Principle, achieves superior reasoning capabilities compared to larger models at a significantly lower cost.

AI-generated summary

Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.

Community

Through the innovative Spectrum-to-Signal Principle (SSP) training methodology, the 1.5B-parameter VibeThinker-1.5B surpasses giant models hundreds of times larger across multiple reasoning benchmarks, demonstrating at an extremely low cost that small models can also achieve top-tier reasoning capabilities.

GitHub https://github.com/WeiboAI/VibeThinker

image

image

image

image

Paper author Paper submitter

An extreme test that if 1.5B model can achieve strong reasoning ability

Paper author Paper submitter

SimpleTestForVibeThinker
A simple evaluation (Still recommend you to test this model with competitive math / python algorithm tasks)

·

There is no point of testing LLMs, even math-specific ones with such tasks. LLMs won't and are not supposed to be used like this. Please stop testing them like this, use calculator or tool instead.
VibeThinker is an astonishing model. I've already tested it on writing some algorithms and, despite it's size, it handles it very well. Though, code optimization problem is still unsolvable in any meaningful way.

Nice work! Would you kindly share more details, such as RL training curve and SFT/RL performance?

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.06221 in a dataset README.md to link it from this page.

Spaces citing this paper 3

Collections including this paper 1