Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published 8 days ago • 23
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published 14 days ago • 36
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Paper • 2508.09726 • Published 27 days ago • 13
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models Paper • 2508.09968 • Published 26 days ago • 15
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing Paper • 2508.09192 • Published Aug 8 • 30