Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published 19 days ago • 35
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published 24 days ago • 20
Emergent Agentic Transformer from Chain of Hindsight Experience Paper • 2305.16554 • Published May 26, 2023
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models Paper • 2504.02882 • Published Apr 2 • 7