TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
1The University of Hong Kong
2Nanjing University
3University of Chinese Academy of Sciences 4Nanyang Technological University
5Harbin Institute of Technology
3University of Chinese Academy of Sciences 4Nanyang Technological University
5Harbin Institute of Technology
(*Equal Contribution. ‡Project Leader. †Corresponding Author.)
Paper | Project Page | LoRA Weights | Code
About
We propose TACA, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.
- Downloads last month
- 104
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for ldiex/TACA
Base model
black-forest-labs/FLUX.1-dev