view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 269
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 6.61M • Updated Oct 30, 2025 • 27
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 5.47M • Updated Oct 30, 2025 • 20.7k
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 5.47M • Updated Oct 30, 2025 • 20.7k
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 6.61M • Updated Oct 30, 2025 • 27
michaelbenayoun/deepseekv3-tiny-4kv-heads-4-layers-random Text Generation • 5.27M • Updated Jul 24, 2025 • 4
michaelbenayoun/deepseekv3-tiny-4kv-heads-4-layers-random Text Generation • 5.27M • Updated Jul 24, 2025 • 4