Devil in the Number: Towards Robust Multi-modality Data Filter Paper β’ 2309.13770 β’ Published Sep 24, 2023
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper β’ 2410.03051 β’ Published Oct 4, 2024 β’ 6
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark Paper β’ 2504.14693 β’ Published Apr 20
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper β’ 2410.08261 β’ Published Oct 10, 2024 β’ 52
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Paper β’ 2307.16449 β’ Published Jul 31, 2023 β’ 16