view article Article Blazingly fast whisper transcriptions with Inference Endpoints By mfuntowicz and 5 others • May 13 • 70
view article Article The New and Fresh analytics in Inference Endpoints By erikkaum and 4 others • Mar 21 • 21
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 439
Running 2.75k 2.75k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub By jsulz and 3 others • Feb 12 • 68
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference By mfuntowicz and 1 other • Jan 16 • 75
view article Article Train 400x faster Static Embedding Models with Sentence Transformers By tomaarsen • Jan 15 • 195
view post Post 1798 A while ago I started experimenting with compiling the Python interpreter to WASM.To build a secure, fast, and lightweight sandbox for code execution — ideal for running LLM-generated Python code.- Send code simply as a POST request- 1-2ms startup timesHack away:https://github.com/ErikKaum/runner 🔥 8 8 👀 6 6 + Reply
Running 67 67 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 Evaluate multilingual models using FineTasks
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python By erikkaum and 6 others • Oct 22, 2024 • 44
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python By erikkaum and 6 others • Oct 22, 2024 • 44
view post Post 1113 This week in Inference Endpoints - thx @erikkaum for the update!👀 https://huggingface.co/blog/erikkaum/endpoints-changelog 1 reply · 🚀 1 1 👍 1 1 🔥 1 1 ❤️ 1 1 + Reply