CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification Paper • 2508.21046 • Published Aug 28 • 9
Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts Paper • 2506.10357 • Published Jun 12 • 21