RADIO Collection A collection of Foundation Vision Models that combine multiple models (CLIP, DINOv2, SAM, etc.). • 13 items • Updated 2 days ago • 17
💫StarVector Models Collection StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20 • 93
view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 997
BiMediX2 Collection BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities • 5 items • Updated Dec 17, 2024 • 7
LayerSkip Collection Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 • 8 items • Updated Nov 21, 2024 • 47
MedEmbed: Embedding Models for Medical Domain Collection GitHub -> https://github.com/abhinand5/MedEmbed • 4 items • Updated Oct 21, 2024 • 9
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 42
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 15 items • Updated 9 days ago • 228
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Paper • 2407.12594 • Published Jul 17, 2024 • 19