view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others β’ Dec 9, 2022 β’ 293
Running 87 87 LLM Embeddings Explained: A Visual and Intuitive Guide π How Language Models Turn Text into Meaning, From Traditional
Light-R1 Collection Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond β’ 7 items β’ Updated Mar 13 β’ 12
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others β’ Jan 28 β’ 870