Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published 28 days ago • 31
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 86
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 86
LITA: Language Instructed Temporal-Localization Assistant Paper • 2403.19046 • Published Mar 27, 2024 • 19
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21, 2024 • 19
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching Paper • 2402.14167 • Published Feb 21, 2024 • 11
Eureka: Human-Level Reward Design via Coding Large Language Models Paper • 2310.12931 • Published Oct 19, 2023 • 26