Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations Paper • 2410.08049 • Published Oct 10, 2024 • 8
Explore the Limits of Omni-modal Pretraining at Scale Paper • 2406.09412 • Published Jun 13, 2024 • 11
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5, 2024 • 18
Meta-Transformer: A Unified Framework for Multimodal Learning Paper • 2307.10802 • Published Jul 20, 2023 • 44
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors Paper • 2312.04963 • Published Dec 7, 2023 • 17
OneLLM: One Framework to Align All Modalities with Language Paper • 2312.03700 • Published Dec 6, 2023 • 24
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition Paper • 2311.15599 • Published Nov 27, 2023 • 1
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities Paper • 2401.14405 • Published Jan 25, 2024 • 13