RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation Paper • 2501.08617 • Published 28 days ago • 10
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse Paper • 2410.21333 • Published Oct 27, 2024 • 10