Is it possible to use "prompt" or "hotwords" to steer decoding similar to Whisper?

#8
by spashii - opened

It should be possible to do with a corpus at least: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/asr_customization/ngpulm_language_modeling_and_customization.html#ngpulm-ngram-modeling

Training the n-gram model is really fast.

But any time I add it and try to transcribe on a sound file that's more than 40 seconds long (still less than a minute) it will drop a bunch of sentences.

And then they have word boosting which doesn't seem to work on an AER model like canary

https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/asr_customization/word_boosting.html#

Sign up or log in to comment