--- base_model: - black-forest-labs/FLUX.1-dev - stabilityai/stable-diffusion-3.5-medium library_name: diffusers license: mit pipeline_tag: text-to-image ---

TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lv*1, Tianlin Pan*2,3, Chenyang Si2‡†, Zhaoxi Chen4, Wangmeng Zuo5, Ziwei Liu4†, Kwan-Yee K. Wong1†
1The University of Hong Kong       2Nanjing University
3University of Chinese Academy of Sciences       4Nanyang Technological University
5Harbin Institute of Technology
(*Equal Contribution.    Project Leader.    Corresponding Author.)

Paper | Project Page | LoRA Weights | Code

# About We propose **TACA**, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.