Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published Dec 16, 2024 • 23
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published Dec 12, 2024 • 28
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 49
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 29
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper • 2412.16112 • Published Dec 20, 2024 • 23
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Paper • 2501.07730 • Published Jan 13 • 17
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30 • 18
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 35
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published Feb 13 • 7
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation Paper • 2502.18302 • Published Feb 25 • 5
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published Feb 28 • 25
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification Paper • 2503.02537 • Published Mar 4 • 11
Autoregressive Image Generation with Randomized Parallel Decoding Paper • 2503.10568 • Published Mar 13 • 8
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation Paper • 2503.10618 • Published Mar 13 • 17
Neighboring Autoregressive Modeling for Efficient Visual Generation Paper • 2503.10696 • Published Mar 12 • 8
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published 26 days ago • 71
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models Paper • 2503.18352 • Published 23 days ago • 6
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 17 days ago • 93
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance Paper • 2504.06232 • Published 8 days ago • 11
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published 6 days ago • 39
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published 5 days ago • 38