view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 16 days ago • 351
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 138
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published Feb 11 • 29
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 52
DepthPro Models Collection Depth Pro: Sharp Monocular Metric Depth in Less Than a Second • 4 items • Updated Feb 7 • 7
ViTPose Collection Collection for ViTPose models based on transformers implementation. • 10 items • Updated Jan 12 • 13