SLiC-HF: Sequence Likelihood Calibration with Human Feedback Paper • 2305.10425 • Published May 17, 2023 • 6
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published 28 days ago • 45
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 170
panda-gym: Open-source goal-conditioned environments for robotic learning Paper • 2106.13687 • Published Jun 25, 2021 • 3
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5, 2024 • 7
Distributional Preference Alignment of LLMs via Optimal Transport Paper • 2406.05882 • Published Jun 9, 2024 • 2
view article Article Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training By siro1 and 4 others • Aug 8 • 59
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published about 1 month ago • 175
EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes Paper • 2508.00180 • Published Jul 31 • 1
view article Article Vision Language Model Alignment in TRL ⚡️ By sergiopaniego and 4 others • Aug 7 • 78
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7 • 338
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • Aug 5 • 490
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • Jul 29 • 170
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15, 2024 • 87
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 180