Transcribe American English Spelling v1 Tiny ctranslate2

Specialty Speech-to-Text (Transcription / Automatic Speech Recognition) Model

This is the first release of this model. Performance results are shown below. Report any errors by making a post under Community on the model repo card, to be fixed in future releases.

For all available models, see this HuggingFace collection. For ctranslate2 variants (useful for Faster Whisper), add -ctranslate2 to any model slug.

While training datasets are private, you can find the library for English variant conversions open sourced here.

Background on Whisper English Variants

Whisper models disproportionately transcribe into US english, particularly when there are no obviously British english words (e.g. "rubbish" vs "trash" / "garbage").

Trelis British Spelling and American Spelling transcription models aim to make outputs uniformly follow either US or British spelling.

Note that these models do not swap out different words with the same meaning, e.g. they will use the correct variant of colour vs color, but will not swap "trash" for "rubbish". For updates on such a model (the "lexical" variant), you can stay updated by subscribing on trelis.substack.com.

Performance

Trelis Transcribe models are fine-tunes of Whisper models.

Performance is compared on three metrics:

  • Word Error Rate on two datasets: LibriSpeech and Trelis/transcribe-to-en_GB-v1 or Trelis/transcribe-to-en_US-v1
  • US -> GB %, i.e. percentage of the transcript that has American english words on Trelis/transcribe-to-en_GB-v1 or Trelis/transcribe-to-en_US-v1
  • GB -> US %, i.e. percentage of the transcript that has British english words on Trelis/transcribe-to-en_GB-v1 or Trelis/transcribe-to-en_US-v1

US and GB percentages are measured deterministically via a ~6,000 list of exact matches of British <-> American English word pairs.

Test datasets:

  • Trelis/transcribe-to-en_GB-v1 or Trelis/transcribe-to-en_US-v1 - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
  • openslr/librispeech_asr - 50 rows from the test.other split which contains mixed English language samples with high WER.

British (EN_UK) Variant Transcription Performance

While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).

Dataset: Trelis/transcribe-to-en_GB-v1
Config: N/A
Split: test
Text Column: text

Timestamp Model WER % Samples (Eval/Total/Skipped) US→GB % GB→US % Normalized Device
2025-12-02 12:21:43 openai/whisper-tiny 10.06% 30/30/0 6.12% 0.54% Yes mps
2025-12-02 12:16:42 Trelis/transcribe-en_gb-spelling-v1-tiny 4.58% 30/30/0 1.01% 5.64% Yes mps
2025-12-02 12:28:33 openai/whisper-large-v3-turbo 7.15% 30/30/0 5.27% 1.62% Yes mps
2025-12-02 13:11:01 Trelis/transcribe-en_gb-spelling-v1-turbo 1.18% 30/30/0 0.20% 6.70% Yes mps

American (EN_US) Transcription Performance

Original Whisper models already tend to transcribe to American English, and so the improvement in transcription performance is smaller on the fine-tuned model, although improving by ~1.5% on the Turbo model.

Dataset: Trelis/asr-en_mixed-to-en_US-tts-test-20251202-105023
Config: N/A
Split: test
Text Column: text

Timestamp Model WER % Samples (Eval/Total/Skipped) US→GB % GB→US % Normalized Device
2025-12-02 11:03:11 openai/whisper-tiny 4.93% 30/30/0 6.32% 0.54% Yes mps
2025-12-02 13:43:28 Trelis/transcribe-en_us-spelling-v1-tiny 3.89% 30/30/0 6.38% 0.27% Yes mps
2025-12-02 11:02:45 openai/whisper-large-v3-turbo 4.03% 30/30/0 5.47% 1.62% Yes mps
2025-12-02 14:24:32 Trelis/transcribe-en_us-spelling-v1-turbo 1.25% 30/30/0 6.84% 0.07% Yes mps

LibriSpeech Performance

LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:

Dataset: openslr/librispeech_asr
Config: other
Split: test
Text Column: text

Timestamp Model WER % Samples (Eval/Total/Skipped) US→GB % GB→US % Normalized Device
2025-12-02 09:27:52 openai/whisper-tiny 11.62% 50/50/0 0.00% 0.00% Yes mps
2025-12-02 12:17:18 Trelis/transcribe-en_gb-spelling-v1-tiny 13.18% 50/50/0 0.00% 0.00% Yes mps
2025-12-02 13:44:04 Trelis/transcribe-en_us-spelling-v1-tiny 12.40% 50/50/0 0.00% 0.00% Yes mps
2025-11-27 13:23:00 openai/whisper-large-v3-turbo 4.47% 50/50/0 0.00% 0.00% Yes mps
2025-12-02 13:24:33 Trelis/transcribe-en_gb-spelling-v1-turbo 4.02% 50/50/0 0.00% 0.00% Yes mps
2025-12-02 14:37:54 Trelis/transcribe-en_us-spelling-v1-turbo 4.13% 50/50/0 0.00% 0.00% Yes mps

Inference

Quick Demo (3 samples)

Copy/paste to transcribe the first three rows from a HuggingFace dataset with Trelis/transcribe-en_us-spelling-v1-tiny-ctranslate2:

uv run --isolated --with transformers --with 'datasets<3.0' --with soundfile --with librosa --with torchaudio python - <<'PY'
from datasets import load_dataset
from transformers import pipeline

DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
MODEL_ID = "Trelis/transcribe-en_us-spelling-v1-tiny-ctranslate2"

print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
dataset = load_dataset(DATASET_ID, split="test[:3]")

print(f"Loading ASR model: {MODEL_ID}")
asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word")

for idx, sample in enumerate(dataset):
    audio = sample["audio"]
    transcription = asr(
        {"array": audio["array"], "sampling_rate": audio["sampling_rate"]}
    )
    print(f"\nSample {idx + 1}")
    print(f"  Reference: {sample.get('text')}")
    print(f"  Transcript: {transcription['text']}")
PY

Make sure you have Hugging Face access to both the dataset and model (huggingface-cli login).

Transcribe your own audio (/path/to/audio.wav):

uv run --isolated --with transformers --with 'datasets<3.0' --with soundfile --with librosa --with torchaudio python - <<'PY'
from transformers import pipeline
import torchaudio

MODEL_ID = "Trelis/transcribe-en_us-spelling-v1-tiny-ctranslate2"
audio_path = "/path/to/audio.wav"  # change me

audio, sr = torchaudio.load(audio_path)
asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word")
result = asr({"array": audio.squeeze().numpy(), "sampling_rate": sr})
print(f"Transcript: {result['text']}")
PY

Bulk README Uploads

Render/push README files for multiple repos listed in model_info/readme_targets.yaml:

# Preview rendered files in model_info/generated_readmes/
uv run --with pyyaml --with huggingface_hub python model_info/push_readmes.py

# Push READMEs to HuggingFace Hub (requires huggingface-cli login)
uv run --with pyyaml --with huggingface_hub python model_info/push_readmes.py --push

Each entry in readme_targets.yaml may optionally override base_model and stripe_link.
transcribe-en_us-spelling-v1-tiny-ctranslate2 is auto-derived from the slug; defaults exist for tiny, small, and turbo tiers.

Server Inference

For guidance on inference, see this video.

CTranslate2 and Faster Whisper is recommended if you wish to operate a server. You can modify this one-click Runpod affiliate link to get started quickly.

Further Support

  • For model-specific questions create a post under "Community" on the repo card.
  • For support with custom fine-tunings, see trelis.com/ADVANCED-audio OR for deeper support book a session here.

Jobs

Trelis is hiring a part-time developer on contract to assist with model development. Apply here.

License & Usage (Trelis Transcribe v1 Models)

Tiny models are open for commercial use under the MIT License.

Turbo models are commercially licensed and:

  • Available for purchase by individuals or small organisations under a basic license.
  • Available for licensing for larger organisations here.

Small orgs are defined as entities with less than $1M revenue across all of their products/services over the last year AND less than 25 employees.

Basic License Details (for individuals + small orgs)

Purchase gives an individual or small organisation a lifetime license to v1. Future major versions (v2, v3, …) may be sold separately.

You may:

  • Use the model for personal, academic, and research projects.
  • Use it for internal transcription (meetings, calls, training, docs, etc.).
  • Use it inside your own products and services (SaaS, apps, internal tools).
  • Run it on your own servers or embedded in your app (desktop / mobile / edge), so users transcribe audio through your app.
  • Fine-tune the model for your own internal or product use.

You may not:

  • Redistribute the original or fine-tuned weights
    • e.g. upload to other model hubs, share checkpoints, ship raw model files to clients.
  • Offer a general-purpose STT service for other developers or companies
    • e.g. “we sell an STT API anyone can build on” using these weights as the core engine.
  • Resell or rebrand the model itself (weights as a product).

On-device use is fine only as an internal component of your app. Users get features, not reusable model files.

Bigger / infrastructure use

If you:

  • Are above the size threshold above, or
  • Want to offer speech-to-text as a general-purpose API/service, or
  • Need rights to redistribute original or fine-tuned weights, or
  • Want access to larger model sizes (e.g. fine-tunes of Whisper Large v3), or
  • Want support / SLAs / early access to future versions

Kindly describe your use case here and I will respond promptly.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Trelis/transcribe-en_us-spelling-v1-tiny-ctranslate2

Finetuned
(1664)
this model

Collection including Trelis/transcribe-en_us-spelling-v1-tiny-ctranslate2