Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs Paper • 2508.04660 • Published Aug 6 • 2
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published Jul 31 • 112
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21 • 68
ConSens: Assessing context grounding in open-book question answering Paper • 2505.00065 • Published Apr 30 • 1