faster-whisper-large-v3-int8-ct2
This repository contains the OpenAI Whisper Large v3 model converted to the CTranslate2 format with int8
quantization.
This conversion makes the model significantly faster and more memory-efficient for inference, with a minimal trade-off in accuracy. CTranslate2 is an inference engine for Transformer models developed by OpenNMT.
Model Details
- Base Model:
openai/whisper-large-v3
- Format: CTranslate2
- Quantization:
int8
How to Use
You can use this model with the faster-whisper library.
First, install faster-whisper
:
pip install faster-whisper
Then, you can use the model in your Python code:
from faster_whisper import WhisperModel
model_path = "groxaxo/faster-whisper-large-v3-int8-ct2"
# Run on GPU with FP16
# model = WhisperModel(model_path, device="cuda", compute_type="float16")
# or run on GPU with INT8
# model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
model = WhisperModel(model_path, device="cpu", compute_type="int8")
segments, info = model.transcribe("audio.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Replace "audio.mp3"
with the path to your audio file.
Conversion
The model was converted using the ct2-transformers-converter
tool from the CTranslate2 project.
ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3-int8-ct2 --quantization int8 --copy_files tokenizer.json preprocessor_config.json
Disclaimer
Quantization can have a small impact on the model's accuracy. While int8
quantization is generally safe and provides a good balance of performance and accuracy, you should evaluate the model on your specific task to ensure it meets your requirements.
- Downloads last month
- 15
Model tree for groxaxo/faster-whisper-large-v3-int8-ct2
Base model
openai/whisper-large-v3