How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 1 day ago • 14
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published 6 days ago • 58
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published 7 days ago • 33
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 8 days ago • 9 • 2
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 8 days ago • 9
Difficulty Estimation Math Datasets Collection We perform difficulty estimation on popular math datasets. • 5 items • Updated 7 days ago • 1
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 8 days ago • 9
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base Paper • 2503.23361 • Published 17 days ago • 6 • 2