LaViDa: A Large Diffusion Language Model for Multimodal Understanding Paper • 2505.16839 • Published May 22 • 12
LaViDa-1.0 Collection LArge VIsion-language Diffusion moDel with mAsking • 11 items • Updated May 26 • 7
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection Paper • 2503.12271 • Published Mar 15 • 9
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published Dec 2, 2024 • 13