Dan Busbridge

dbusbridge

danbusbridge
dbusbridge

AI & ML interests

Deep learning, optimization, self-supervised learning, representation learning, large language modeling, equivariance, geometric deep learning, attention mechanisms, transformers

Recent Activity

authored a paper 2 days ago

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

authored a paper 4 months ago

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

commented on a paper over 1 year ago

How to Scale Your EMA

View all activity

Organizations

Papers 3

arxiv:2501.12370

arxiv:2409.04431

arxiv:2307.13813

models

None public yet

datasets

None public yet