CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published 12 days ago • 39
Configurable Preference Tuning ⚙️📝 Collection CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts • 7 items • Updated Jun 17 • 1