quantized model?

#26
by gestalt73 - opened

any options to request / or generate an 8 bit or 4 bit quantized version of the model? This would be a hoot to use on raspberry pi 5 and rockchip sbcs.

Thanks!

Alan

Actually I answered my own question... kinda

This library and onnx file transcribes the sample wav file in 1/2 the time on an rk3588 sbc (1.6 seconds vs 3.7 seconds for 7 seconds of audio)

https://github.com/istupakov/onnx-asr
https://huggingface.co/istupakov/parakeet-tdt-0.6b-v2-onnx

I'd still like to find an 8bit or 4bit quantized version of the file but this parakeet model is fast either way. :-)

I'd still like to find an 8bit or 4bit quantized version of the file but this parakeet model is fast either way. :-)

Hi @gestalt73 !
You can use 8bit quantized version with my library:

import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v2", quantization="int8")
print(model.recognize("test.wav"))

Oh nice! Thanks! I missed that.

Updated stats on my rockchip rk3588 sbc with the 7 second sample:

  • nemo transcription: 3.7 seconds, 1.89x realtime
  • onnx_asr (16bit): 1.5 seconds, 4.66x realtime
  • onnx_asr (8bit): 0.9 seconds,7.77x realtime
gestalt73 changed discussion status to closed

https://github.com/NullSense/Parrator/

I made this simple tool, able to run a daemon and quickly interact with a shortcut to start/stop recording, auto pasting supported, configurable.

Perhaps some of you like this. Quite amazed at the speed of parakeet.

Daemon: Transcription Stats - Chars: 256, Words: 47
Daemon: Attempting to auto-paste from clipboard in 0.5 seconds...
         Ensure a text field is active and focused!
Daemon: Paste simulated.

--- Performance Summary (Daemon Mode) ---
Total time (rec start to paste end): 15.092s
  Recording duration:                  14.136s
  VAD processing duration:             0.002s
  Audio processing after VAD:        0.003s
  ASR Transcription duration:          0.334s
  Clipboard & Paste duration:        0.583s
----------------------------------------

Some dirty benchmarking. Running on RX 6750XT Win11

This comment has been hidden (marked as Resolved)

Sign up or log in to comment