These are forced alignment models at the phoneme level for the text-to-speech (TTS) task.

They also have high accuracy in localizing pauses in speech, which can be useful for training voice activity detection (VAD) models.

For documentation and usage examples, please refer to SpeechFlow project.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support