A newer version of this model is available: primeline/whisper-tiny-german-1224

whisper-tiny-german

This model is a German Speech Recognition model based on the whisper-tiny model. The model weights count 37.8M parameters and with a size of 73MB in bfloat16 format.

As a follow-up to the Whisper large v3 german we decided to create a tiny version to be used in edge cases where the model size is a concern.

Intended uses & limitations

The model is intended to be used for German speech recognition tasks. It is designed to be used for edge cases where the model size is a concern. It's not recommended to use this model for critical use cases, as it is a tiny model and may not perform well in all scenarios.

Dataset

The dataset used for training is a filtered subset of the Common Voice dataset, multilingual librispeech and some internal data. The data was filtered and double checked for quality and correctness. We did some normalization to the text data, especially for casing and punctuation.

Model family

Model Parameters link
Whisper large v3 german 1.54B link
Whisper large v3 turbo german 809M link
Distil-whisper large v3 german 756M link
tiny whisper 37.8M link

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • total_train_batch_size: 512
  • num_epochs: 5.0

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.3.0a0+ebedce2
  • Datasets 2.18.0
  • Tokenizers 0.15.2

How to use

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "primeline/whisper-tiny-german"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])

About us

primeline AI

Your partner for AI infrastructure in Germany

Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing.

Optimized for AI training and inference.

Model author: Florian Zimmermeister

Disclaimer

This model is not a product of the primeLine Group. 

It represents research conducted by [Florian Zimmermeister](https://huggingface.co/flozi00), with computing power sponsored by primeLine. 

The model is published under this account by primeLine, but it is not a commercial product of primeLine Solutions GmbH.

Please be aware that while we have tested and developed this model to the best of our abilities, errors may still occur. 

Use of this model is at your own risk. We do not accept liability for any incorrect outputs generated by this model.
Downloads last month
1,144
Safetensors
Model size
37.8M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.