-
I-Con: A Unifying Framework for Representation Learning
Paper • 2504.16929 • Published • 27 -
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Paper • 2504.16078 • Published • 19 -
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Paper • 2504.15785 • Published • 16 -
OTC: Optimal Tool Calls via Reinforcement Learning
Paper • 2504.14870 • Published • 32
Collections
Discover the best community collections!
Collections including paper arxiv:2504.14870
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 42 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 122 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 25 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 26 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 14
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 111 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 99 -
ToolRL: Reward is All Tool Learning Needs
Paper • 2504.13958 • Published • 40 -
OTC: Optimal Tool Calls via Reinforcement Learning
Paper • 2504.14870 • Published • 32
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 120 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 25 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 99 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 74
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70