Paris: A Decentralized Trained Open-Weight Diffusion Model Paper • 2510.03434 • Published 22 days ago • 2 • 2
Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published Sep 2 • 12 • 1
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14 • 59 • 2
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper • 2507.19427 • Published Jul 25 • 18 • 2