mlx-community/supertonic-2

This model was converted to MLX format from Supertone/supertonic-2 using mlx-audio version 0.2.8.

SuperTonic 2 is a high-quality text-to-speech model with voice style control.

Use with mlx-audio

pip install -U mlx-audio

CLI Example:

mlx_audio.tts.generate --model mlx-community/supertonic-2 --text "Hello, this is a test." --voice M1

Python Example:

from mlx_audio.tts.utils import load_model

model = load_model("mlx-community/supertonic-2")
for result in model.generate("Hello, this is a test.", voice="M1"):
    print(f"Generated {result.audio_duration} of audio")

Model Details

Architecture: Text encoder + Duration predictor + Flow matching (vector field) + Vocoder
Sample rate: 44100 Hz
Voices: M1-M5, F1-F5 (10 built-in voice styles)
Latent dim: 24 (compressed to 144 via chunking)
Flow matching steps: 10 (configurable)

Downloads last month: 7

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support