Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 49
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 48
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models Paper • 2311.17043 • Published Nov 28, 2023