Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers Paper • 2506.07986 • Published 5 days ago • 17
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation Paper • 2501.10687 • Published Jan 18 • 14