Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published 19 days ago • 81
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation Paper • 2509.05263 • Published 23 days ago • 10
Symbolic Graphics Programming with Large Language Models Paper • 2509.05208 • Published 23 days ago • 45
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published 13 days ago • 102
Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge Paper • 2509.06079 • Published 21 days ago • 6
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published 13 days ago • 25
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits Paper • 2509.11362 • Published 14 days ago • 4
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning Paper • 2509.11543 • Published 13 days ago • 46
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook Paper • 2509.14142 • Published 11 days ago • 9