Multilingual forced alignment

These are forced alignment models at the phoneme level for the text-to-speech (TTS) task.

They also have high accuracy in localizing pauses in speech, which can be useful for training voice activity detection (VAD) models.

For documentation and usage examples, please refer to SpeechFlow project.

segmentation_example

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support