Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published 19 days ago • 14
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization Paper • 2410.19609 • Published 23 days ago • 15
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation Paper • 2411.00412 • Published 16 days ago • 9
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning Paper • 2410.02052 • Published Oct 2 • 9