view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others • Dec 9, 2022 • 252
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 109 items • Updated 19 days ago • 100