ONNX conversion

#9
by Berrisius - opened

Trying to convert the parakeet-tdt-0.6b-v2 model to ONNX format for deployment, but I'm unsure how to proceed with the export. Has anyone successfully converted this model or can share guidance on the correct steps?

Would also love to see TransformersJS support for this model @Xenova πŸ™

Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.

@hammeiam Feel free to open a feature request on GitHub. I don't have a lot of bandwidth at the moment, so hopefully a community member is interested in writing the inference code for it.

Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.

@hammeiam Feel free to open a feature request on GitHub. I don't have a lot of bandwidth at the moment, so hopefully a community member is interested in writing the inference code for it.
@Xenova
is there an example to follow to write the ONNX inference code? especially if it's a streaming implementation.

Thanks @nithinraok , I got it!
Currently, ONNX export only supports converting the encoder-decoder part of the model. How can I export the full model to ONNX, including the additional preprocessor and decoding?

Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.

Hi @Xenova . Sorry to bother you, but I'm getting error RuntimeError: narrowing_error when exporting to ONNX on a T4. Would you know the solution? Here is my code:

asr_model = nemo_asr.models.ASRModel.restore_from("./parakeet-tdt-0.6b-v2.nemo")
assert isinstance(asr_model, EncDecRNNTBPEModel)
asr_model.freeze()
asr_model.to("cuda")

asr_model.export("./parakeet.onnx")

Thanks in advance!

Maybe someone will be interested - I recently made a Python package onnx-asr for ASR inference via ONNX with minimal dependencies (no pytorch, nemo or transformers). And it has support for parakeet-tdt-0.6b-v2.

@istupakov would it work on WebGPU to run in the browser?

My package - onnx-asr is written in Python, so it won't work in the browser.

As for running this model in a browser in principle, in addition to an encoder and decoder, a preprocessor and decoding code are required. The encoder and decoder are saved when exporting the model to onnx in Nemo, the preprocessor in onnx can be taken from my library. But you will have to write the decoding code in js yourself.

Alternatively, you can save this model not with a TDT decoder, but with CTC (as far as I understand, the model supports this) - in this case, the decoding code is quite trivial.

My package - onnx-asr is written in Python, so it won't work in the browser.

As for running this model in a browser in principle, in addition to an encoder and decoder, a preprocessor and decoding code are required. The encoder and decoder are saved when exporting the model to onnx in Nemo, the preprocessor in onnx can be taken from my library. But you will have to write the decoding code in js yourself.

Alternatively, you can save this model not with a TDT decoder, but with CTC (as far as I understand, the model supports this) - in this case, the decoding code is quite trivial.

I wrote a audio to text demo in C++ based on onnx-asr, which can load the parakeet-tdt-0.6b-v2-onnx model for ASR inference. The results are quite good, and I hope you all like it.

Sign up or log in to comment