Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 29
DiLoCo: Distributed Low-Communication Training of Language Models Paper • 2311.08105 • Published Nov 14, 2023 • 15
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo Paper • 2503.09799 • Published 30 days ago • 13
Scaling Language Models: Methods, Analysis & Insights from Training Gopher Paper • 2112.11446 • Published Dec 8, 2021 • 1