quantized model?
any options to request / or generate an 8 bit or 4 bit quantized version of the model? This would be a hoot to use on raspberry pi 5 and rockchip sbcs.
Thanks!
Alan
Actually I answered my own question... kinda
This library and onnx file transcribes the sample wav file in 1/2 the time on an rk3588 sbc (1.6 seconds vs 3.7 seconds for 7 seconds of audio)
https://github.com/istupakov/onnx-asr
https://huggingface.co/istupakov/parakeet-tdt-0.6b-v2-onnx
I'd still like to find an 8bit or 4bit quantized version of the file but this parakeet model is fast either way. :-)
I'd still like to find an 8bit or 4bit quantized version of the file but this parakeet model is fast either way. :-)
Hi
@gestalt73
!
You can use 8bit quantized version with my library:
import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v2", quantization="int8")
print(model.recognize("test.wav"))
Oh nice! Thanks! I missed that.
Updated stats on my rockchip rk3588 sbc with the 7 second sample:
- nemo transcription: 3.7 seconds, 1.89x realtime
- onnx_asr (16bit): 1.5 seconds, 4.66x realtime
- onnx_asr (8bit): 0.9 seconds,7.77x realtime
https://github.com/NullSense/Parrator/
I made this simple tool, able to run a daemon and quickly interact with a shortcut to start/stop recording, auto pasting supported, configurable.
Perhaps some of you like this. Quite amazed at the speed of parakeet.
Daemon: Transcription Stats - Chars: 256, Words: 47
Daemon: Attempting to auto-paste from clipboard in 0.5 seconds...
Ensure a text field is active and focused!
Daemon: Paste simulated.
--- Performance Summary (Daemon Mode) ---
Total time (rec start to paste end): 15.092s
Recording duration: 14.136s
VAD processing duration: 0.002s
Audio processing after VAD: 0.003s
ASR Transcription duration: 0.334s
Clipboard & Paste duration: 0.583s
----------------------------------------
Some dirty benchmarking. Running on RX 6750XT Win11