view article Article mmBERT: ModernBERT goes Multilingual By orionweller and 5 others • 15 days ago • 93
Running 3.24k 3.24k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • Aug 11 • 73
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 161
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 133
Training Datasets Collection A collection of pseudo-labelled datasets used to train the Distil-Whisper model. • 9 items • Updated Mar 21, 2024 • 14
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 25
Running on CPU Upgrade 13.6k 13.6k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots