arxiv:2501.12370
Dan Busbridge
dbusbridge
AI & ML interests
Deep learning, optimization, self-supervised learning, representation learning, large language modeling, equivariance, geometric deep learning, attention mechanisms, transformers
Recent Activity
authored
a paper
2 days ago
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for
Mixture-of-Experts Language Models
authored
a paper
4 months ago
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
commented on
a paper
over 1 year ago
How to Scale Your EMA