Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published 13 days ago • 60
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 18 days ago • 95
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 13
Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text Paper • 2211.11300 • Published Nov 21, 2022 • 1
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Paper • 2503.07002 • Published 19 days ago • 38
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4, 2024 • 36
GenDec: A robust generative Question-decomposition method for Multi-hop reasoning Paper • 2402.11166 • Published Feb 17, 2024 • 1
Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study Paper • 2403.03186 • Published Mar 5, 2024 • 5
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 32
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Paper • 2402.06619 • Published Feb 9, 2024 • 55