Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
XenovaΒ 
posted an update Aug 23
Post
12792
I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90 tokens/second on WebGPU w/ Transformers.js and ONNX Runtime Web! 🀯 Since everything runs 100% locally, no messages are sent to a server β€” a huge win for privacy!
- πŸ€— Demo: webml-community/phi-3.5-webgpu
- πŸ§‘β€πŸ’» Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu

why in the world should be cool to run something in a browser when it can run locally using llama.cpp ??

Β·

Why in the world would you bother installing llama.cpp when you can just open a webpage?

Depends upon the GPU hardware tbh. Not everywhere you can get 90 tokens/sec. :)

As a frontend dev, LLMs were not meant for the browsers. You have to download the weights every time you reload the page. It's impressive that they do run well in the browser, but I don't see any practical use cases.

Β·

You don't donwload the weights every time, they are usually stored in the OPFS or the IndexedDB.

i'm getting 11.24tokens/second on an MacBook M1 Pro