Running 1.24k 1.24k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 10 days ago • 48
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 68
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 147
view post Post 1742 A while ago I started experimenting with compiling the Python interpreter to WASM.To build a secure, fast, and lightweight sandbox for code execution — ideal for running LLM-generated Python code.- Send code simply as a POST request- 1-2ms startup timesHack away:https://github.com/ErikKaum/runner 🔥 8 8 👀 6 6 + Reply
Running 58 58 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 Evaluate multilingual models using FineTasks
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python Oct 22, 2024 • 44
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python Oct 22, 2024 • 44
view post Post 1097 This week in Inference Endpoints - thx @erikkaum for the update!👀 https://huggingface.co/blog/erikkaum/endpoints-changelog 1 reply · 🚀 1 1 👍 1 1 🔥 1 1 ❤️ 1 1 + Reply