-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 92 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 67
Collections
Discover the best community collections!
Collections including paper arxiv:2502.18449
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 65 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 47 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 30
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 21 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Paper • 2402.01391 • Published • 42 -
Code Representation Learning At Scale
Paper • 2402.01935 • Published • 13 -
Long Code Arena: a Set of Benchmarks for Long-Context Code Models
Paper • 2406.11612 • Published • 25 -
Agentless: Demystifying LLM-based Software Engineering Agents
Paper • 2407.01489 • Published • 61
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 30 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 23 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69