Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency Paper • 2309.17382 • Published Sep 29, 2023 • 5
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published May 29, 2024 • 22