ZeRO: Memory Optimizations Toward Training Trillion Parameter Models Paper • 1910.02054 • Published Oct 4, 2019 • 5
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 362
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 109
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper • 1910.10683 • Published Oct 23, 2019 • 11
Solving math word problems with process- and outcome-based feedback Paper • 2211.14275 • Published Nov 25, 2022 • 9
The Perfect Blend: Redefining RLHF with Mixture of Judges Paper • 2409.20370 • Published Sep 30, 2024 • 5
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16, 2024 • 35