FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis Paper • 2503.13265 • Published 7 days ago • 15
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published 10 days ago • 17
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published 10 days ago • 17
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published 10 days ago • 17
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published 11 days ago • 20
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published 11 days ago • 20
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published 13 days ago • 59
ABC: Achieving Better Control of Multimodal Embeddings using VLMs Paper • 2503.00329 • Published 24 days ago • 18