view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others • Dec 9, 2022 • 293
Light-R1 Collection Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond • 7 items • Updated Mar 13 • 12
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 870