InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper ā¢ 2504.10479 ā¢ Published 1 day ago ā¢ 172
SmolVLM: Redefining small and efficient multimodal models Paper ā¢ 2504.05299 ā¢ Published 8 days ago ā¢ 158
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper ā¢ 2502.14786 ā¢ Published Feb 20 ā¢ 142