Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published 6 days ago • 65
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action Paper • 2505.01583 • Published 11 days ago • 9
Science-T2I: Addressing Scientific Illusions in Image Synthesis Paper • 2504.13129 • Published 27 days ago • 3
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 20 days ago • 88
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published 27 days ago • 18
WORLDMEM: Long-term Consistent World Simulation with Memory Paper • 2504.12369 • Published 28 days ago • 32
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11 • 47
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 84
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated Apr 12 • 65
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 160
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published Apr 2 • 37
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published Apr 8 • 62
Science-T2I Collection Addressing Scientific Illusions in Image Synthesis • 10 items • Updated 17 days ago • 4
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published Mar 30 • 133
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 5 items • Updated 14 days ago • 110
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published Mar 21 • 36