SkyLadder: Better and Faster Pretraining via Context Window Scheduling Paper β’ 2503.15450 β’ Published 5 days ago β’ 11
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis Paper β’ 2503.13265 β’ Published 7 days ago β’ 15
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper β’ 2503.11579 β’ Published 10 days ago β’ 17
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper β’ 2503.11579 β’ Published 10 days ago β’ 17
ChatMusician: Understanding and Generating Music Intrinsically with LLM Paper β’ 2402.16153 β’ Published Feb 25, 2024 β’ 60
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper β’ 2402.17723 β’ Published Feb 27, 2024 β’ 16
ComposerX: Multi-Agent Symbolic Music Composition with LLMs Paper β’ 2404.18081 β’ Published Apr 28, 2024 β’ 2
Mixed Neural Voxels for Fast Multi-view Video Synthesis Paper β’ 2212.00190 β’ Published Dec 1, 2022
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions Paper β’ 2407.20962 β’ Published Jul 30, 2024
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper β’ 2502.14739 β’ Published Feb 20 β’ 97