view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? By orrzohar and 3 others β’ 24 days ago β’ 37
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others β’ Jun 3 β’ 226
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others β’ May 21 β’ 204
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others β’ May 12 β’ 505
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others β’ Feb 20 β’ 293
view article Article SmolVLM Grows Smaller β Introducing the 250M & 500M Models! By andito and 2 others β’ Jan 23 β’ 182
view article Article SmolVLM - small yet mighty Vision Language Model By andito and 4 others β’ Nov 26, 2024 β’ 346
view article Article Deploying Speech-to-Speech on Hugging Face By andito and 3 others β’ Oct 22, 2024 β’ 40
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? By danaaubakirova and 1 other β’ Jul 25, 2024 β’ 17
view article Article Docmatix - a huge dataset for Document Visual Question Answering By andito and 1 other β’ Jul 18, 2024 β’ 76
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models By andito and 2 others β’ Jun 24, 2024 β’ 199