Submitted by huangsiteng 95 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model · 16 authors 1
Submitted by TianxiangMa 83 HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning · 10 authors 145 1
Submitted by Haozhan72 52 SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning · 21 authors 473 1
Submitted by Yoohao 50 EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs · 7 authors 8 2
Submitted by taesiri 33 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis · 14 authors 1
Submitted by Jarvis1111 30 Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents · 10 authors 1
Submitted by taesiri 24 FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark · 10 authors 37 1
Submitted by LanguageBind 23 Can Understanding and Generation Truly Benefit Together -- or Just Coexist? · 14 authors 1
Submitted by taesiri 13 SpatialVID: A Large-Scale Video Dataset with Spatial Annotations · 15 authors 1
Submitted by ManTle 7 Visual Programmability: A Guide for Code-as-Thought in Chart Understanding · 9 authors 12 1
Submitted by Kaichengalex 5 Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval · 6 authors 9 1
Submitted by learn12138 5 2D Gaussian Splatting with Semantic Alignment for Image Inpainting · 4 authors 1
Submitted by taesiri 2 LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering · 17 authors 4 1
Submitted by taesiri 2 OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning · 13 authors 1
Submitted by weipang142857 2 The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward · 10 authors 1
Submitted by oravus 1 ObjectReact: Learning Object-Relative Control for Visual Navigation · 8 authors
Submitted by renkelin 1 Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation · 3 authors 0 1
Submitted by iliashum - Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated · 7 authors 1