view article Article Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face Feb 11, 2025 • 104
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published 7 days ago • 30
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published 7 days ago • 48
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 Jan 18, 2024 • 78
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 296
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models Dec 15, 2025 • 107
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 224
WPO: Enhancing RLHF with Weighted Preference Optimization Paper • 2406.11827 • Published Jun 17, 2024 • 17
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 119