9 9 3

Erik Kaunismäki

erikkaum

https://www.erikkaum.com/

AI & ML interests

None yet

Recent Activity

new activity 15 days ago

Qwen/Qwen3-Embedding-8B-GGUF:Add feature-extraction as pipline tag

new activity 15 days ago

Qwen/Qwen3-Embedding-4B-GGUF:Add feature-extraction as pipline tag

new activity 15 days ago

Qwen/Qwen3-Embedding-0.6B-GGUF:Add feature-extraction as pipline tag

View all activity

Organizations

New activity in Qwen/Qwen3-Embedding-8B-GGUF 15 days ago

Add feature-extraction as pipline tag

#3 opened 15 days ago by

erikkaum

New activity in Qwen/Qwen3-Embedding-4B-GGUF 15 days ago

Add feature-extraction as pipline tag

#6 opened 15 days ago by

erikkaum

New activity in Qwen/Qwen3-Embedding-0.6B-GGUF 15 days ago

Add feature-extraction as pipline tag

#16 opened 15 days ago by

erikkaum

New activity in openai/whisper-large-v3-turbo 3 months ago

WTF is going on?

#71 opened 6 months ago by

vbarrier

commented on Test-Driving the LLMD Inference Engine by ZML 🚀 3 months ago

Thank you 🫡

posted an update 4 months ago

Post

2593

ZML just released a technical preview of their new Inference Engine: LLMD.

- Just 2.4GB container, which means fast startup times and efficient autoscaling
- Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs.
- written in Zig

I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide 👇 You can try it in like 5 minutes!

https://huggingface.co/blog/erikkaum/test-driving-llmd-inference-engine

1 reply

published an article 4 months ago

Article

Test-Driving the LLMD Inference Engine by ZML 🚀

•

Jul 18

• 24

posted an update 4 months ago

Post

2100

We just released native support for @SGLang and @vllm-project in Inference Endpoints 🔥

Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.

And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users 🙌

upvoted an article 4 months ago

Article

Nano-vLLM meets Inference Endpoints

•

Jun 25

• 9

published an article 6 months ago

Article

Blazingly fast whisper transcriptions with Inference Endpoints

May 13

• 79

updated a dataset 7 months ago

huggingface/documentation-images

Viewer • Updated 2 days ago • 55 • 1.97M • 89

published an article 8 months ago

Article

The New and Fresh analytics in Inference Endpoints

Mar 21

• 21

upvoted an article 8 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12

• 468

liked a Space 8 months ago

3.37k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted an article 9 months ago

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

Feb 12

• 77

upvoted 2 articles 10 months ago

Article

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

Jan 16

• 75

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15

• 216

upvoted a collection 10 months ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated 10 days ago • 298

liked a Space 11 months ago

Bluesky DID Lookup

🦋

Find the DID for a Bluesky handle

posted an update 12 months ago

Post

1807

A while ago I started experimenting with compiling the Python interpreter to WASM.

To build a secure, fast, and lightweight sandbox for code execution — ideal for running LLM-generated Python code.

- Send code simply as a POST request
- 1-2ms startup times

Hack away:
https://github.com/ErikKaum/runner

Erik Kaunismäki

AI & ML interests

Recent Activity

Organizations

erikkaum's activity

Add feature-extraction as pipline tag

Add feature-extraction as pipline tag

Add feature-extraction as pipline tag

WTF is going on?

Test-Driving the LLMD Inference Engine by ZML 🚀

Nano-vLLM meets Inference Endpoints

Blazingly fast whisper transcriptions with Inference Endpoints

The New and Fresh analytics in Inference Endpoints

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

The Ultra-Scale Playbook

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

Train 400x faster Static Embedding Models with Sentence Transformers

Bluesky DID Lookup