V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated 26 days ago • 144
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3 • 58
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 475
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 55
view article Article SigLIP 2: A better multilingual vision language encoder By ariG23498 and 2 others • Feb 21 • 174
view article Article FastRTC: The Real-Time Communication Library for Python By freddyaboulton and 1 other • Feb 25 • 169
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.27k
view article Article 🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! By ariG23498 • Jan 29 • 19
view article Article Welcome to Inference Providers on the Hub 🔥 By julien-c and 6 others • Jan 28 • 483
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla • Jan 20 • 68
view article Article Train 400x faster Static Embedding Models with Sentence Transformers By tomaarsen • Jan 15 • 195
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • Nov 19, 2024 • 112
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 2 days ago • 40
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 11 items • Updated May 1 • 61