ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published 3 days ago • 67 • 2
Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published 15 days ago • 5 • 2