MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4, 2024 • 36
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 12 days ago • 293
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28, 2024 • 19
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 106
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper • 2409.08513 • Published Sep 13, 2024 • 12
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Paper • 2409.09213 • Published Sep 13, 2024 • 12
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper • 2408.02555 • Published Aug 5, 2024 • 29