How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published 8 days ago • 29
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.27k
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others • Feb 20 • 282
view article Article SmolVLM - small yet mighty Vision Language Model By andito and 4 others • Nov 26, 2024 • 329
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 273
view article Article DeepSearch Using Visual RAG in Agentic Frameworks 🔎 By paultltc and 1 other • Mar 21 • 34
view article Article Reinforcement Learning for Large Language Models: Beyond the Agent Paradigm By royswastik • Mar 19 • 6