AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection Paper • 2505.07293 • Published 2 days ago • 19
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset Paper • 2412.02595 • Published Dec 3, 2024 • 5
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning Paper • 2504.13941 • Published 28 days ago • 10
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Paper • 2504.12322 • Published Apr 11 • 28
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? Paper • 2504.00509 • Published Apr 1 • 21
How to Get Your LLM to Generate Challenging Problems for Evaluation Paper • 2502.14678 • Published Feb 20 • 17
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24 • 20
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion Paper • 2502.04235 • Published Feb 6 • 22