Update README.md
Browse files
README.md
CHANGED
@@ -33,6 +33,27 @@ Table: **Word Error Rate (WER)** comparison between KBLab's Whisper models and t
|
|
33 |
|
34 |
We provide checkpoints in different formats: `Hugging Face`, `whisper.cpp` (GGML), `onnx`, and `ctranslate2` (used in `faster-whisper` and `WhisperX`).
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
#### Hugging Face
|
37 |
|
38 |
Inference example for using `KB-Whisper` with Hugging Face:
|
@@ -66,6 +87,7 @@ generate_kwargs = {"task": "transcribe", "language": "sv"}
|
|
66 |
res = pipe("audio.mp3",
|
67 |
chunk_length_s=30,
|
68 |
generate_kwargs={"task": "transcribe", "language": "sv"})
|
|
|
69 |
```
|
70 |
|
71 |
#### Faster-whisper
|
|
|
33 |
|
34 |
We provide checkpoints in different formats: `Hugging Face`, `whisper.cpp` (GGML), `onnx`, and `ctranslate2` (used in `faster-whisper` and `WhisperX`).
|
35 |
|
36 |
+
### 2025-05-13 Update!
|
37 |
+
The default when loading our models through Hugging Face is **Stage 2**.
|
38 |
+
As of May 2025 there exists two **Stage 2** versions in addition to the default, namely **Subtitle** and **Strict** that specify the transcription style.
|
39 |
+
By specifying `revision="subtitle"` in `.from_pretrained()` the model version with a more condensed style of transcribing is accessed.
|
40 |
+
By specifying `revision="strict"` in `.from_pretrained()` the more verbatim-like version of the model is accessed.
|
41 |
+
Below is an example of how this argument is passed in the `.from_pretrained()` function
|
42 |
+
```python
|
43 |
+
import torch
|
44 |
+
from datasets import load_dataset
|
45 |
+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
|
46 |
+
|
47 |
+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
48 |
+
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
|
49 |
+
model_id = "KBLab/kb-whisper-large"
|
50 |
+
model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
51 |
+
model_id, torch_dtype=torch_dtype, use_safetensors=True, cache_dir="cache", revision="strict"
|
52 |
+
)
|
53 |
+
```
|
54 |
+
The verbosity of the transcription styles of the three model versions ranges from the least verbose **Subtitle**, to **Stage 2** (default) to the most verbose **Strict**.
|
55 |
+
|
56 |
+
|
57 |
#### Hugging Face
|
58 |
|
59 |
Inference example for using `KB-Whisper` with Hugging Face:
|
|
|
87 |
res = pipe("audio.mp3",
|
88 |
chunk_length_s=30,
|
89 |
generate_kwargs={"task": "transcribe", "language": "sv"})
|
90 |
+
print(res)
|
91 |
```
|
92 |
|
93 |
#### Faster-whisper
|