Submitted by lixiaochuan 84 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation · 13 authors 2
Submitted by tellarin 58 Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills · 9 authors 2
Submitted by limuloo1999 40 DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models · 4 authors 3
Submitted by yyyyyyjjjjzzz 36 SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? · 8 authors 3
Submitted by akhaliq 25 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization · 7 authors 2
Submitted by Orannue 24 Edit Transfer: Learning Image Editing via Vision In-Context Relations · 4 authors 7
Submitted by ZyZcuhk 23 BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing · 9 authors 2
Submitted by jmhb 20 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research · 23 authors 2
Submitted by Lingaaaaaaa 16 WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes · 8 authors 2
Submitted by ZhaofengWu 15 reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs · 6 authors 2
Submitted by lwpyh 11 V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning · 6 authors 2
Submitted by Luo-Yihong 9 Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation · 5 authors 2
Submitted by Buzz-lightyear 9 Long-Video Audio Synthesis with Multi-Agent Collaboration · 5 authors 3
Submitted by soarhigh 6 Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions · 7 authors 2
Submitted by k-nick 5 Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework · 8 authors 2
Submitted by FQiao 3 GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching · 4 authors 3
Submitted by JesseTNRoberts 3 Investigating Human-Aligned Large Language Model Uncertainty · 4 authors 2
Submitted by Sckathach 3 Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models · 3 authors 2
Submitted by zxbsmk 2 WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation · 12 authors 2