Submitted by Wenxuan123 139 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model HKUSTGZ 67 4
Submitted by xiaochonglinghu 104 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training AMAP-ML 111 3
Submitted by gowitheflow 94 Scaling Language-Centric Omnimodal Representation Learning DAMO Academy 23 4
Submitted by Everything-is-Ok 94 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation CLAIN-WHU 9 2
Submitted by taesiri 33 FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution · 7 authors 348 3
Submitted by raymin0223 29 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models KAIST AI 2
Submitted by Ray2333 25 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning University of Illinois at Urbana-Champaign 2
Submitted by dongyuanjushi 21 R-WoM: Retrieval-augmented World Model For Computer-use Agents · 7 authors 2
Submitted by Wayne-King 19 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models The University of Hong Kong 72 3
Submitted by taesiri 15 UniFusion: Vision-Language Model as Unified Encoder in Image Generation Adobe 3
Submitted by simonycl 15 Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Stanford NLP 214 3
Submitted by TokerZ 14 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks Beijing JiaoTong University 2
Submitted by XingweiT 14 Deconstructing Attention: Investigating Design Principles for Effective Language Modeling · 3 authors 2
Submitted by NeoZ123 12 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models Z.ai 7 2
Submitted by taesiri 9 SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model ByteDance 2
Submitted by ruihangxu 8 ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation · 4 authors 22 2
Submitted by stefan-baumann 5 What If : Understanding Motion Through Sparse Interactions CompVis 15 2
Submitted by MasterZhou 5 The Geometry of Reasoning: Flowing Logics in Representation Space · 5 authors 13 2
Submitted by orpatashnik 5 Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing · 6 authors 2
Submitted by codezakh 4 One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration · 5 authors 1 2
Submitted by ArmelRandy 4 LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens ALMAnaCH (Inria) 1 2
Submitted by Franck-Dernoncourt 4 MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces · 15 authors 2
Submitted by linghan199 3 ExpVid: A Benchmark for Experiment Video Understanding & Reasoning OpenGVLab 6 2
Submitted by sunweiwei 3 Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management · 7 authors 2
Submitted by CuiLong7 2 ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution OpenGVLab 2
Submitted by YongdingTao 2 Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models Peking University 3 2
Submitted by ConnorZhong 1 Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance THUML @ Tsinghua University 2
Submitted by ShuoChen99 1 Bag of Tricks for Subverting Reasoning-based Safety Guardrails · 9 authors 2
Submitted by JiayuDing 1 Information-Preserving Reformulation of Reasoning Traces for Antidistillation Microsoft 2
Submitted by southKH 1 Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap · 5 authors 2
Submitted by MasterZhou 1 Why Do Transformers Fail to Forecast Time Series In-Context? · 4 authors 12 2
Submitted by cesun 1 ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability · 4 authors 2
Submitted by zhengda1936 - dInfer: An Efficient Inference Framework for Diffusion Language Models · 23 authors 2