Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published 6 days ago • 16
GenDec: A robust generative Question-decomposition method for Multi-hop reasoning Paper • 2402.11166 • Published Feb 17, 2024 • 1
Open-World Skill Discovery from Unsegmented Demonstrations Paper • 2503.10684 • Published 13 days ago • 5
API Agents vs. GUI Agents: Divergence and Convergence Paper • 2503.11069 • Published 11 days ago • 31
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published 7 days ago • 20
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published 8 days ago • 60
Pixel-Level Reasoning Segmentation via Multi-turn Conversations Paper • 2502.09447 • Published Feb 13 • 1
"Principal Components" Enable A New Language of Images Paper • 2503.08685 • Published 13 days ago • 11
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S Collection SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 8
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 14 days ago • 95
SEA-VL: Multicultural VL Dataset for Southeast Asia Collection Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia • 3 items • Updated 13 days ago • 14
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 13
Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text Paper • 2211.11300 • Published Nov 21, 2022 • 1
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Paper • 2501.17144 • Published Jan 28 • 6
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Paper • 2503.07002 • Published 15 days ago • 38
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models Paper • 2503.05638 • Published 17 days ago • 17