V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published 15 days ago • 26
ATLAS: Learning to Optimally Memorize the Context at Test Time Paper • 2505.23735 • Published 28 days ago • 23
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks Paper • 2505.11881 • Published May 17 • 4
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published Apr 14 • 12
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 142
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published Feb 7 • 65
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 65
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published Jan 16 • 37