Submitted by cccjc 41 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing · 5 authors 1
Submitted by StarJiaxing 29 LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs · 4 authors 3
Submitted by Mengyi 23 XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation · 7 authors 94 3
Submitted by THUdyh 19 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models · 14 authors 7 1
Submitted by ChengyouJia 15 From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios · 4 authors 19 1
Submitted by AdinaY 10 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity · 22 authors 1
Submitted by LeoLau 9 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy · 5 authors 28 1
Submitted by SivanSX 9 Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs · 9 authors 1
Submitted by tennant 8 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements · 23 authors 1
Submitted by AhmedMostafa 5 Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training · 3 authors 1
Submitted by Luo-Yihong 4 Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls · 4 authors 1
Submitted by nomadlx 4 Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning · 5 authors 1
Submitted by mdmoor 3 SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning · 12 authors 3 1
Submitted by Srizzle 2 Performance Prediction for Large Systems via Text-to-Text Regression · 10 authors 24 1
Submitted by j-morano 1 RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models · 4 authors 2 1
Submitted by Srikumar26 1 Global and Local Entailment Learning for Natural World Imagery · 5 authors 1
Submitted by pengxiang - GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling · 15 authors 1