Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 170
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 125
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond Paper • 2505.19641 • Published May 26 • 67
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published May 25 • 43
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting Paper • 2505.18822 • Published May 24 • 14
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization Paper • 2505.18092 • Published May 23 • 44
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 89
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 179
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Paper • 2503.04222 • Published Mar 6 • 15
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 161
view article Article FuseO1-Preview: System-II Reasoning Fusion of LLMs By Wanfq and 4 others • Jan 20 • 21
view article Article FuseChat-3.0: Preference Optimization for Implicit Model Fusion By Wanfq and 2 others • Dec 18, 2024 • 5
FuseChat 3.0 Collection Preference Optimization for Implicit Model Fusion • 11 items • Updated Jan 16 • 1
FuseChat 3.0 Collection Preference Optimization for Implicit Model Fusion • 14 items • Updated Mar 7 • 14
Weighted-Reward Preference Optimization for Implicit Model Fusion Paper • 2412.03187 • Published Dec 4, 2024 • 12
Tiny Series Collection Tiny datasets that empower the foundation of Small Language Model! • 11 items • Updated Jan 26, 2024 • 39