Sekai: A Video Dataset towards World Exploration Paper • 2506.15675 • Published 12 days ago • 61
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 14 days ago • 248
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published 28 days ago • 46
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Paper • 2505.17952 • Published May 23 • 21
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published Mar 30 • 136
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
FLUX.1 Collection A collection of our FLUX.1 models and LoRAs. • 9 items • Updated 4 days ago • 123
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published Feb 26 • 63
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 145
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation Paper • 2501.04144 • Published Jan 7 • 19
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Apr 28 • 220
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 398 items • Updated about 6 hours ago • 52