KV Caching Explained: Optimizing Transformer Inference Efficiency
β’
12
seems to be working on my side, you either can read the full blogpost at https://huggingface.co/blog/not-lain/tensor-dims
or you can click on this dropdown menu which will add more text to the current blogpost
the short version would be faster and consistent inference in the cost of more gpu consumption
pip install kokoro
, and still 82M parameters.import { KokoroTTS } from "kokoro-js";
const tts = await KokoroTTS.from_pretrained(
"onnx-community/Kokoro-82M-ONNX",
{ dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);
const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
{ voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");