DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation Paper β’ 2503.10618 β’ Published Mar 13 β’ 18
STIV: Scalable Text and Image Conditioned Video Generation Paper β’ 2412.07730 β’ Published Dec 10, 2024 β’ 75
Running on Zero 328 328 MLLM-guided Image Editing (MGIE) π© Transform images based on textual instructions
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper β’ 2404.07973 β’ Published Apr 11, 2024 β’ 33