view article Article NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets 7 days ago β’ 29
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 13 days ago β’ 342
Shot categorizer Collection Fine-tune of Florence-2 to generate shot categories, useful for data curation. Code: https://github.com/huggingface/movie-shot-categorizer. β’ 3 items β’ Updated 18 days ago β’ 2
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. β’ 5 items β’ Updated 20 days ago β’ 68
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 21 days ago β’ 69
view article Article HuggingFace, IISc partner to supercharge model building on India's diverse languages 26 days ago β’ 17
Phi-4 Collection Phi-4 family of small language and multi-modal models. β’ 7 items β’ Updated 21 days ago β’ 110
SigLIP 2 Collection OpenCLIP and timm SigLIP 2 models β’ 45 items β’ Updated about 1 month ago β’ 14
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 133
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google Feb 19 β’ 65