Submitted by kuznetsoffandrey 93 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models · 5 authors 18
Submitted by wujie10 59 Seedance 1.0: Exploring the Boundaries of Video Generation Models · 44 authors 2
Submitted by Hanyuezhuohua 43 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation · 5 authors 2
Submitted by imryanxu 42 ComfyUI-R1: Exploring Reasoning Models for Workflow Generation · 8 authors 4
Submitted by akhaliq 42 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation · 9 authors 2
Submitted by hassid 27 Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation · 3 authors 2
Submitted by LongMountain 22 SeerAttention-R: Sparse Attention Adaptation for Long Reasoning · 15 authors 2
Submitted by jy-yuan 15 Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning · 10 authors 2
Submitted by Lemoncoke 15 SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner · 9 authors 3
Submitted by zhenzhiwang 12 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions · 8 authors 2
Submitted by niveck 11 Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games · 3 authors 2
Submitted by WaltonFuture 9 Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning · 7 authors 2
Submitted by guqiao 8 SAFE: Multitask Failure Detection for Vision-Language-Action Models · 7 authors 2
Submitted by taesiri 5 Hidden in plain sight: VLMs overlook their visual representations · 4 authors 1
Submitted by ashawkey 4 Efficient Part-level 3D Object Generation via Dual Volume Packing · 10 authors 2
Submitted by NikV09 4 UFM: A Simple Path towards Unified Dense Correspondence with Flow · 12 authors 2
Submitted by Zory 4 Can Vision Language Models Infer Human Gaze Direction? A Controlled Study · 10 authors 2
Submitted by sungwon95 3 Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models · 5 authors 2
Submitted by j-morano 2 MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis · 10 authors 2
Submitted by wy1iu 2 Reparameterized LLM Training via Orthogonal Equivalence Transformation · 6 authors 2
Submitted by SushantGautam 1 Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy · 3 authors 2
Submitted by fangwu97 1 When to Trust Context: Self-Reflective Debates for Context Reliability · 8 authors 2
Submitted by Prakamya - TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games · 6 authors 2
Submitted by TreeForest - A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy · 13 authors 2