view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 209
mistralai/Mistral-7B-Instruct-v0.2 Text Generation • 7B • Updated 28 days ago • 681k • • 2.92k