view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 3 days ago • 76
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 9 days ago • 47
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 16 days ago • 187
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published 21 days ago • 27
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • 20 days ago • 35
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated 8 days ago • 90
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • 28 days ago • 62
view article Article How biased is Whisper ? Evaluating Whisper Models for Robustness to Diverse English Accents By Steveeeeeeen • 22 days ago • 16
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 12 items • Updated about 11 hours ago • 74
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 3 items • Updated 25 days ago • 356