Multilingual forced alignment
These are forced alignment models at the phoneme level for the text-to-speech (TTS) task.
They also have high accuracy in localizing pauses in speech, which can be useful for training voice activity detection (VAD) models.
For documentation and usage examples, please refer to SpeechFlow project.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support