Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20 • 134
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published Nov 27, 2024 • 20
lmms-lab/llava-next-interleave-qwen-0.5b Text Generation • 0.9B • Updated Jul 12, 2024 • 18 • 12
lmms-lab/llava-next-interleave-qwen-7b-dpo Text Generation • 8B • Updated Jul 12, 2024 • 137 • 12
lmms-lab/llava-next-interleave-qwen-7b Text Generation • 8B • Updated Jul 24, 2024 • 110 • 27
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10, 2024 • 43