Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning Paper • 2505.21067 • Published 19 days ago • 3 • 1
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective Paper • 2505.17652 • Published 23 days ago • 6
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective Paper • 2505.17652 • Published 23 days ago • 6
Not All Correct Answers Are Equal: Why Your Distillation Source Matters Paper • 2505.14464 • Published 26 days ago • 8
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published 28 days ago • 36 • 5
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity Paper • 2503.01506 • Published Mar 3 • 9
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity Paper • 2503.01506 • Published Mar 3 • 9
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization Paper • 2411.06208 • Published Nov 9, 2024 • 21 • 8
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems Paper • 2408.16293 • Published Aug 29, 2024 • 27
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 102 • 6
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published May 16, 2024 • 131