EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper โข 2411.08380 โข Published Nov 13, 2024 โข 27
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper โข 2409.06135 โข Published Sep 10, 2024 โข 16
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation Paper โข 2408.01708 โข Published Aug 3, 2024 โข 4
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation Paper โข 2312.06462 โข Published Dec 11, 2023