Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation Paper • 2505.06027 • Published 30 days ago • 17
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 41
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31 • 40