--- base_model: - black-forest-labs/FLUX.1-dev - stabilityai/stable-diffusion-3.5-medium library_name: diffusers license: mit pipeline_tag: text-to-image ---

TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lv*¹, Tianlin Pan*^2,3, Chenyang Si^2‡†, Zhaoxi Chen⁴, Wangmeng Zuo⁵, Ziwei Liu^4†, Kwan-Yee K. Wong^1†

¹The University of Hong Kong ²Nanjing University
³University of Chinese Academy of Sciences ⁴Nanyang Technological University
⁵Harbin Institute of Technology

(*Equal Contribution. ^‡Project Leader. ^†Corresponding Author.)

Paper | Project Page | LoRA Weights | Code

# About We propose **TACA**, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.