view article Article π¦Έπ»#14: What Is MCP, and Why Is Everyone β Suddenly!β Talking About It? By Kseniase β’ Mar 17 β’ 291
Test-Time-Registers Collection A collection of models augmented with test-time registers, a training-free way to add register tokens to pretrained models. β’ 2 items β’ Updated 9 days ago β’ 3
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann β’ 8 items β’ Updated 6 days ago β’ 116
LayerSkip Collection Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 β’ 8 items β’ Updated Nov 21, 2024 β’ 48
view article Article KV Cache from scratch in nanoVLM By ariG23498 and 4 others β’ 15 days ago β’ 71
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper β’ 2506.01144 β’ Published 18 days ago β’ 14
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others β’ 16 days ago β’ 147
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper β’ 2506.01844 β’ Published 17 days ago β’ 96
view changelog Changelog Xet is now the default storage option for new users and organizations 27 days ago β’ 66
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others β’ May 15 β’ 114
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others β’ 29 days ago β’ 157
view article Article Microsoft and Hugging Face expand collaboration By jeffboudier and 2 others β’ May 19 β’ 22
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. β’ 22 items β’ Updated Oct 4, 2024 β’ 29
view article Article Improving Hugging Face Model Access for Kaggle Users By roseberryv and 4 others β’ May 14 β’ 29
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others β’ May 12 β’ 444
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper β’ 2401.15947 β’ Published Jan 29, 2024 β’ 54