Image and Video Tokenization with Binary Spherical Quantization Paper • 2406.07548 • Published Jun 11, 2024
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation Paper • 2502.05178 • Published Feb 7 • 10
Distilling Vision-Language Models on Millions of Videos Paper • 2401.06129 • Published Jan 11, 2024 • 17
LEAP: Liberate Sparse-view 3D Modeling from Camera Poses Paper • 2310.01410 • Published Oct 2, 2023 • 1
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 30