TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lv*¹, Tianlin Pan*^2,3, Chenyang Si^2‡†, Zhaoxi Chen⁴, Wangmeng Zuo⁵, Ziwei Liu^4†, Kwan-Yee K. Wong^1†

¹The University of Hong Kong ²Nanjing University
³University of Chinese Academy of Sciences ⁴Nanyang Technological University
⁵Harbin Institute of Technology

(*Equal Contribution. ^‡Project Leader. ^†Corresponding Author.)

Paper | Project Page | LoRA Weights | Code

About

We propose TACA, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.

ldiex
/

TACA

TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

About

Model tree for ldiex/TACA