Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation Paper • 2506.04225 • Published Jun 4 • 27
facebook/dinov3-vit7b16-pretrain-lvd1689m Image Feature Extraction • 7B • Updated 20 days ago • 32k • 135
facebook/dinov3-convnext-large-pretrain-lvd1689m Image Feature Extraction • 0.2B • Updated 20 days ago • 5.86k • 9
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published Mar 6 • 26
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published Mar 18 • 20
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Paper • 2506.08009 • Published Jun 9 • 28
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published Feb 17 • 20