view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr β’ Feb 7 β’ 212
view article Article CinePile 2.0 - making stronger datasets with adversarial refinement By mfarre and 3 others β’ Oct 23, 2024 β’ 18
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? By orrzohar and 3 others β’ Jul 23 β’ 40
view article Article PaliGemma β Google's Cutting-Edge Open Vision Language Model By merve and 2 others β’ May 14, 2024 β’ 266
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann β’ 8 items β’ Updated Jun 13 β’ 161
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper β’ 2506.03147 β’ Published Jun 3 β’ 58
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others β’ May 12 β’ 523
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence β’ 15 items β’ Updated May 5 β’ 55
view article Article SigLIP 2: A better multilingual vision language encoder By ariG23498 and 2 others β’ Feb 21 β’ 181
view article Article FastRTC: The Real-Time Communication Library for Python By freddyaboulton and 1 other β’ Feb 25 β’ 172
view article Article Open-source DeepResearch β Freeing our search agents By m-ric and 4 others β’ Feb 4 β’ 1.3k
view article Article π Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! By ariG23498 β’ Jan 29 β’ 19
view article Article Welcome to Inference Providers on the Hub π₯ By julien-c and 6 others β’ Jan 28 β’ 488
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 170
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla β’ Jan 20 β’ 72