DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
Abstract
Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels. To address this limitation, we propose a novel approach that leverages the inherent heterogeneity of the diffusion process. Our method, DiffMoE, introduces a batch-level global token pool that enables experts to access global token distributions during training, promoting specialized expert behavior. To unleash the full potential of the diffusion process, DiffMoE incorporates a capacity predictor that dynamically allocates computational resources based on noise levels and sample complexity. Through comprehensive evaluation, DiffMoE achieves state-of-the-art performance among diffusion models on ImageNet benchmark, substantially outperforming both dense architectures with 3x activated parameters and existing MoE approaches while maintaining 1x activated parameters. The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation, demonstrating its broad applicability across different diffusion model applications. Project Page: https://shiml20.github.io/DiffMoE/
Community
TL;DR: We propose DiffMoE to efficiently scale the Diffusion Transformers, achieving 3x dense model performance with 1x active params.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation (2025)
- Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities (2025)
- FlexControl: Computation-Aware ControlNet with Differentiable Router for Text-to-Image Generation (2025)
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening (2025)
- RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers (2025)
- OminiControl2: Efficient Conditioning for Diffusion Transformers (2025)
- Underlying Semantic Diffusion for Effective and Efficient In-Context Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper