From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty Paper • 2407.06071 • Published Jul 8, 2024 • 7
Transforming and Combining Rewards for Aligning Large Language Models Paper • 2402.00742 • Published Feb 1, 2024 • 12
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking Paper • 2312.09244 • Published Dec 14, 2023 • 11
Long-range Language Modeling with Self-retrieval Paper • 2306.13421 • Published Jun 23, 2023 • 16