The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published 21 days ago • 41
Play to Generalize: Learning to Reason Through Game Play Paper • 2506.08011 • Published 17 days ago • 15
ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published 3 days ago • 7
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published 17 days ago • 18
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper • 2506.08300 • Published 17 days ago • 8
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.09250 • Published 16 days ago • 27
A Technical Study into Small Reasoning Language Models Paper • 2506.13404 • Published 10 days ago • 9
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency Paper • 2506.08343 • Published 17 days ago • 46
Frankentext: Stitching random text fragments into long-form narratives Paper • 2505.18128 • Published May 23 • 3
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models Paper • 2506.04180 • Published 22 days ago • 32
BLEUBERI: BLEU is a surprisingly effective reward for instruction following Paper • 2505.11080 • Published May 16 • 5
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published May 21 • 17
HoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models Paper • 2505.20444 • Published May 26 • 3