Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published 12 days ago • 80
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published 5 days ago • 19
Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny Paper • 2507.16331 • Published Jul 22 • 18