OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning Paper • 2306.11249 • Published Jun 20, 2023 • 2
Taming LLMs by Scaling Learning Rates with Gradient Grouping Paper • 2506.01049 • Published Jun 1 • 36
CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects Paper • 2505.21437 • Published May 27 • 22
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 95
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models Paper • 2505.03821 • Published May 3 • 24
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published Apr 24 • 29
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published Apr 24 • 23
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published Apr 24 • 39
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Paper • 2504.01014 • Published Apr 1 • 70
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published Apr 3 • 47
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 93
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 36
OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning Paper • 2209.04851 • Published Sep 11, 2022 • 2
Switch EMA: A Free Lunch for Better Flatness and Sharpness Paper • 2402.09240 • Published Feb 14, 2024 • 3
SemiReward: A General Reward Model for Semi-supervised Learning Paper • 2310.03013 • Published Oct 4, 2023 • 2
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning Paper • 2410.06373 • Published Oct 8, 2024 • 34