Leonoravesterbacka commited on
Commit
6aafecc
·
verified ·
1 Parent(s): 33cee58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -33,6 +33,27 @@ Table: **Word Error Rate (WER)** comparison between KBLab's Whisper models and t
33
 
34
  We provide checkpoints in different formats: `Hugging Face`, `whisper.cpp` (GGML), `onnx`, and `ctranslate2` (used in `faster-whisper` and `WhisperX`).
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  #### Hugging Face
37
 
38
  Inference example for using `KB-Whisper` with Hugging Face:
@@ -66,6 +87,7 @@ generate_kwargs = {"task": "transcribe", "language": "sv"}
66
  res = pipe("audio.mp3",
67
  chunk_length_s=30,
68
  generate_kwargs={"task": "transcribe", "language": "sv"})
 
69
  ```
70
 
71
  #### Faster-whisper
 
33
 
34
  We provide checkpoints in different formats: `Hugging Face`, `whisper.cpp` (GGML), `onnx`, and `ctranslate2` (used in `faster-whisper` and `WhisperX`).
35
 
36
+ ### 2025-05-13 Update!
37
+ The default when loading our models through Hugging Face is **Stage 2**.
38
+ As of May 2025 there exists two **Stage 2** versions in addition to the default, namely **Subtitle** and **Strict** that specify the transcription style.
39
+ By specifying `revision="subtitle"` in `.from_pretrained()` the model version with a more condensed style of transcribing is accessed.
40
+ By specifying `revision="strict"` in `.from_pretrained()` the more verbatim-like version of the model is accessed.
41
+ Below is an example of how this argument is passed in the `.from_pretrained()` function
42
+ ```python
43
+ import torch
44
+ from datasets import load_dataset
45
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
46
+
47
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
48
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
49
+ model_id = "KBLab/kb-whisper-large"
50
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
51
+ model_id, torch_dtype=torch_dtype, use_safetensors=True, cache_dir="cache", revision="strict"
52
+ )
53
+ ```
54
+ The verbosity of the transcription styles of the three model versions ranges from the least verbose **Subtitle**, to **Stage 2** (default) to the most verbose **Strict**.
55
+
56
+
57
  #### Hugging Face
58
 
59
  Inference example for using `KB-Whisper` with Hugging Face:
 
87
  res = pipe("audio.mp3",
88
  chunk_length_s=30,
89
  generate_kwargs={"task": "transcribe", "language": "sv"})
90
+ print(res)
91
  ```
92
 
93
  #### Faster-whisper