DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 4 days ago • 20
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 16 days ago • 82
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published 19 days ago • 98
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images Paper • 2511.22805 • Published Nov 27, 2025 • 3
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 220
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models Paper • 2511.22625 • Published Nov 27, 2025 • 46
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 119
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30, 2025 • 116
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models Paper • 2510.08492 • Published Oct 9, 2025 • 8
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 165
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30, 2025 • 43
Residual Off-Policy RL for Finetuning Behavior Cloning Policies Paper • 2509.19301 • Published Sep 23, 2025 • 18
SPATIALGEN: Layout-guided 3D Indoor Scene Generation Paper • 2509.14981 • Published Sep 18, 2025 • 27