Uploaded model

  • Developed by: jsbeaudry
  • License: apache-2.0
  • Finetuned from model : unsloth/csm-1b

sesame-creole-tts

This model is a fine-tuned version of unsloth/csm-1b on the mix of jsbeaudry/creole-text-voice & jsbeaudry/cmu_haitian_creole_speech datasets.

Demo

Colab Demo

🧠 Model Description

sesame-creole-tts is a text-to-speech (TTS) model designed for Haitian Creole (KreyΓ²l Ayisyen). It is built and fine-tuned using 5,000+ curated audio-text pairs to synthesize, intelligible Creole speech for various use cases including education, accessibility, and conversational AI.

  • Trained for: Haitian Creole Text-to-Speech
  • Dataset: Over 5,000 Haitian Creole sentence-to-audio pairs
  • Voice Type: Male, Female synthetic & natural voices with clear articulation and native accent
  • Sampling Rate: 16 kHz
  • Phonetics: Uses standardized Creole orthography with support for diacritics
  • Objective: Generate natural and expressive Haitian Creole speech for daily communication, education tools, and virtual assistants

πŸ“Š Training and evaluation data

The model was trained on the creole-text-voice dataset, which includes:

  • 8 hours of Haitian Creole Synthetic speechs
  • Annotated, time-aligned text transcripts following Creole orthography

Model usage script

Inference console
Install  packages:
pip install transformers soundfile gradio


import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
from IPython.display import Audio, display
import soundfile as sf # Import soundfile

model_id = "jsbeaudry/sesame-creole-tts"
device = "cuda" if torch.cuda.is_available() else "cpu"

# load the model and the processor
processor = AutoProcessor.from_pretrained(model_id)
model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)


# prepare the inputs
text = "[0]Bonjou tout moun koman nou ye?" # `[0]` for speaker id 0
inputs = processor(text, add_special_tokens=True).to(device)

audio = model.generate(**inputs, output_audio=True)

# Move the audio tensor to the CPU and convert to numpy array before saving with soundfile
audio_numpy = audio[0].to(torch.float32).cpu().numpy()

sf.write("example_without_context.wav", audio_numpy, 24000)
display(Audio(audio_numpy, rate=24000))
Inference with Gradio
Install  packages:
pip install transformers soundfile gradio


import gradio as gr
import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
from IPython.display import Audio, display
import soundfile as sf # Import soundfile

model_id = "jsbeaudry/sesame-creole-tts"
device = "cuda" if torch.cuda.is_available() else "cpu"

# load the model and the processor
processor = AutoProcessor.from_pretrained(model_id)
model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)

def text_to_speech(text, speaker_name):
    speaker_map = {
         "Aleya": 0,
        "Mariz": 1,
        "Anita": 2,
        "Sanit": 3,
        "Jak": 4
    }
    speaker_id = speaker_map[speaker_name]
    # prepare the inputs
    inputs = processor(f"[{speaker_id}]{text}", add_special_tokens=True).to(device)
    # infer the model
    audio = model.generate(**inputs, output_audio=True)
    # Move the audio tensor to the CPU and convert to numpy array
    audio_numpy = audio[0].to(torch.float32).cpu().numpy()
    return (24000, audio_numpy)

iface = gr.Interface(
    fn=text_to_speech,
    inputs=[
        gr.Textbox(lines=2, placeholder="Enter Haitian Creole text here..."),
        gr.Dropdown(["Aleya", "Mariz", "Anita", "Sanit", "Jak"], label="Select Speaker")
    ],
    outputs=gr.Audio(label="Generated Audio"),
    title="Haitian Creole Text-to-Speech",
    description="Enter Haitian Creole text to generate speech using the jsbeaudry/sesame-creole-tts model. Select a speaker from the dropdown."
)

iface.launch(debug=True)

Intended uses & limitations

  • Mixed texts (Creole + French/English) may produce mispronunciation.
  • Long sentences may produce unstable pronunciation.
  • Stability of voice selection. A specific voice identifier may not produce the same tone consistently.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-4
  • train_batch_size: 2
  • seed: 3407
  • gradient_accumulation_steps: 4
  • optim: adamw_8bit
  • lr_scheduler_type: linear
  • num_epochs: 3
  • training_time: 4:24:03
  • num_step: 4080
  • Trainable parameters = 29,032,448/1,661,132,609 (1.75% trained)

πŸ“Œ Citation

If you use this model, please cite:

@misc{whispermediumcreoleoswald2025,
  title={sesame creole tts 11k},
  author={Jean sauvenel beaudry},
  year={2025},
  howpublished={\url{https://huggingface.co/jsbeaudry}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jsbeaudry/sesame-creole-tts

Base model

sesame/csm-1b
Finetuned
unsloth/csm-1b
Finetuned
(114)
this model

Datasets used to train jsbeaudry/sesame-creole-tts