ZML just released a technical preview of their new Inference Engine: LLMD.
- Just 2.4GB container, which means fast startup times and efficient autoscaling - Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs. - written in Zig
I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide π You can try it in like 5 minutes!
Hey! Is there a chance we could have a chat? I have a specific vision-text dataset I wNt to train Gemma 3n on without losing audio but Iβm a little stuck