YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Paper • 2512.23273 • Published 1 day ago • 7
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published about 22 hours ago • 31
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 4 days ago • 48
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 7 days ago • 56
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 13 days ago • 28
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming Paper • 2512.21338 • Published 6 days ago • 20
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations Paper • 2512.21004 • Published 6 days ago • 12
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 13 days ago • 84
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published 8 days ago • 61
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published 11 days ago • 36
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 12 days ago • 81
REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion Paper • 2512.16636 • Published 12 days ago • 25
DEER: Draft with Diffusion, Verify with Autoregressive Models Paper • 2512.15176 • Published 13 days ago • 41
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published 14 days ago • 39
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 14 days ago • 66
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published 15 days ago • 97
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published 15 days ago • 71
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 21 days ago • 114