Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published 10 days ago • 34
Do LLM Agents Have Regret? A Case Study in Online Learning and Games Paper • 2403.16843 • Published Mar 25, 2024 • 2
Do LLM Agents Have Regret? A Case Study in Online Learning and Games Paper • 2403.16843 • Published Mar 25, 2024 • 2