ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published 3 days ago • 25
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning Paper • 2505.16270 • Published May 22 • 6