SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling Paper • 2512.23162 • Published 3 days ago • 8
Nested Browser-Use Learning for Agentic Information Seeking Paper • 2512.23647 • Published 2 days ago • 9
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding Paper • 2512.23646 • Published 2 days ago • 13
DiRL: An Efficient Post-Training Framework for Diffusion Language Models Paper • 2512.22234 • Published 8 days ago • 16
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models Paper • 2512.15560 • Published 14 days ago • 21
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published 4 days ago • 34
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published 2 days ago • 33
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 5 days ago • 51
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 2 days ago • 80
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 8 days ago • 52 • 3
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion Paper • 2512.19535 • Published 9 days ago • 10
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding Paper • 2512.16229 • Published 13 days ago • 15
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published 8 days ago • 8
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 8 days ago • 52
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics Paper • 2512.21010 • Published 7 days ago • 2