TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

1The University of Hong Kong       2Nanjing University
3University of Chinese Academy of Sciences       4Nanyang Technological University
5Harbin Institute of Technology
(*Equal Contribution.    â€¡Project Leader.    â€ Corresponding Author.)

Paper | Project Page | LoRA Weights | Code

About

We propose TACA, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.

Downloads last month
104
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ldiex/TACA

Finetuned
(430)
this model