File size: 7,207 Bytes
6cd14d8 c52cd50 6cd14d8 14adf6d 2557c6d 6cd14d8 64c1387 6cd14d8 57e4e99 6cd14d8 57e4e99 6cd14d8 57e4e99 6cd14d8 57e4e99 6cd14d8 a478c0c 6cd14d8 1a9a95c 6cd14d8 23ef131 6cd14d8 a478c0c 1a9a95c a478c0c 6cd14d8 799dcb3 6cd14d8 2557c6d 6cd14d8 2557c6d 6cd14d8 57e4e99 18cdbdf a478c0c 9c434ac 6cd14d8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
---
license: apache-2.0
base_model:
- openai/whisper-large-v3
base_model_relation: quantized
pipeline_tag: automatic-speech-recognition
language:
- en
- zh
- de
- es
- ru
- ko
- fr
- ja
- pt
- tr
- pl
- ca
- nl
- ar
- sv
- it
- id
- hi
- fi
- vi
- he
- uk
- el
- ms
- cs
- ro
- da
- hu
- ta
- no
- th
- ur
- hr
- bg
- lt
- la
- mi
- ml
- cy
- sk
- te
- fa
- lv
- bn
- sr
- az
- sl
- kn
- et
- mk
- br
- eu
- is
- hy
- ne
- mn
- bs
- kk
- sq
- sw
- gl
- mr
- pa
- si
- km
- sn
- yo
- so
- af
- oc
- ka
- be
- tg
- sd
- gu
- am
- yi
- lo
- uz
- fo
- ht
- ps
- tk
- nn
- mt
- sa
- lb
- my
- bo
- tl
- mg
- as
- tt
- haw
- ln
- ha
- ba
- jw
- su
- yue
tags:
- audio
- automatic-speech-recognition
- speech-recognition
- whisper
- annthem
- qlip
- thestage
---
# Elastic model: Whisper Large v3. Fastest and most flexible models for self-serving.
Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
* __XL__: Mathematically equivalent neural network, optimized with our DNN compiler.
* __L__: Near lossless model, with less than 1% degradation obtained on corresponding benchmarks.
* __M__: Faster model, with accuracy degradation less than 1.5%.
* __S__: The fastest model, with accuracy degradation less than 2%.
__Goals of elastic models:__
* Provide flexibility in cost vs quality selection for inference
* Provide clear quality and latency benchmarks for speech recognition
* Provide interface of HF libraries: `transformers` and `elastic_models` with a single line of code change for using optimized versions
* Provide models supported on a wide range of hardware (NVIDIA GPUs), which are pre-compiled and require no JIT
* Provide the best models and service for self-hosting
> It's important to note that we have consolidated all elastic model versions into a single optimized S model that provides the best balance of speed and quality for Whisper Large v3.
## Audio Examples
Below are examples demonstrating the transcription quality of the Elastic Whisper Large v3 S model compared to the original.
**Example Audio Transcriptions:**
| Audio Sample | Original Whisper Large v3 | Elastic S Model |
|---|---|---|
| <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/io62uN1l-tpqigMlzQMlm.mpga"></audio> | joel keaton disapproved of films and buster also had reservations about the medium | joel keaton disapproved of films and buster also had reservations about the medium |
| <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/CVabXfIP_Q5qxIjzoy5N6.mpga"></audio> | she ll be alright | she ll be alright |
| <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/-fidVnQcCa32c7-2rNz-w.mpga"></audio> | all is well that ends well | all is well that ends well |
## Inference
To infer our Whisper models, you primarily use the `elastic_models.transformers.WhisperForConditionalGeneration` class.
**Example using `elastic_models` with the optimized model:**
```python
import torch
import librosa # check that you have this package installed
from transformers import AutoProcessor
from transformers.pipelines import pipeline
from elastic_models.transformers import WhisperForConditionalGeneration
model_name = "openai/whisper-large-v3"
mode = "S"
audio_path = "path_to_your_audio.wav"
hf_token = "YOUR_TOKEN"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load processor and model
processor = AutoProcessor.from_pretrained(model_name, token=hf_token)
model = WhisperForConditionalGeneration.from_pretrained(
model_name,
token=hf_token,
torch_dtype=torch.float16,
mode=mode,
device_map=device,
)
model.eval()
# Create pipeline
generator = pipeline(
task="automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
device=device,
)
# Load audio
audio, sr = librosa.load(audio_path, sr=16000)
print(f"Transcribing audio from: {audio_path}")
# Generate transcription using pipeline
generate_kwargs = {
"max_new_tokens": 100,
"num_beams": 1,
}
result = generator(
audio,
generate_kwargs=generate_kwargs,
)
transcription = result["text"]
print(f"Transcription: {transcription}")
```
__System requirements:__
* GPUs: NVIDIA GeForce 4090, NVIDIA GeForce 5090, H100, L40S
* CPU: AMD, Intel
* Python: 3.8-3.12 (check dependencies for specific versions)
To work with our elastic models and compilation tools, you'll need to install `elastic_models` and `qlip` libraries from TheStage:
```shell
pip install thestage
pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
pip install flash-attn==2.7.3 --no-build-isolation
pip install tensorrt==10.11.0.33 # for 4090
pip uninstall apex
# or for blackwell support
pip install 'thestage-elastic-models[blackwell]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
pip install torch==2.7.0+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
# please download the appropriate version of Wheels for your system from https://github.com/Zarrac/flashattention-blackwell-wheels-whl-ONLY-5090-5080-5070-5060-flash-attention-/releases/tag/FlashAttention
mv flash_attn-2.7.4.post1-rtx5090-torch2.7.0cu128cxx11abiTRUE-cp311-linux_x86_64.whl flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
pip install flash_attn-2.7.4.post1-0rtx5090torch270cu128cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
pip install tensorrt==10.11.0.33
pip uninstall apex
```
Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
```shell
thestage config set --api-token <YOUR_API_TOKEN>
```
Congrats, now you can use accelerated models and tools!
----
## Benchmarks
Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for Whisper models using our algorithms.
### Quality benchmarks
Performance evaluation on standard speech recognition benchmarks:
| Metric/Model | S | Original |
|--------------|---|----------|
| WER (Common Voice) | 0.18 | 0.22 |
* **WER (Word Error Rate)**: The primary metric for evaluating speech recognition accuracy. Lower is better.
* **Common Voice**: Multilingual speech recognition benchmark covering diverse languages and accents.
### Latency benchmarks (tps)
Performance for transcribing audio (tps):
**Batch Size 1:**
| GPU Type | S | Original |
|----------|---|----------|
| H100 | 223.47 | 82.84 |
| L40S | 210.67 | 72.36 |
| GeForce RTX 4090 | 240 | 86.63 |
| GeForce RTX 5090 | 265.93 | 195.76 |
## Links
* __Platform__: [app.thestage.ai](https://app.thestage.ai)
* __Subscribe for updates__: [TheStageAI X (Twitter)](https://x.com/TheStageAI)
* __Contact email__: [email protected] |