Quantized Whisper Mini Tamil (quantized-whisper-mini-ta)
This repository contains quantized versions of the fine-tuned Tamil Whisper model ragunath-ravi/whisper-mini-ta optimized for faster inference using faster-whisper and CTranslate2.
Model Overview
This is a collection of quantized versions of the Whisper Small model fine-tuned specifically for Tamil language automatic speech recognition (ASR). The original model achieved a Word Error Rate (WER) of 18.70% on the evaluation set.
Original Model Performance
- Loss: 0.0905
- WER: 18.7042%
- Language: Tamil (ta)
- Base Model: OpenAI Whisper Small
π CTranslate2
Model size: 244M params
Architecture: whisper
Language: Tamil (ta)
Framework: faster-whisper
Available Model Files
Precision | File | Size | Compute Type | Description | Download |
---|---|---|---|---|---|
float32 | float32/model.bin |
0.90 GB | float32 |
Full precision (32-bit floating point) | π₯ Download |
int16 | int16/model.bin |
0.45 GB | int16 |
16-bit integer quantization | π₯ Download |
float16 | float16/model.bin |
0.45 GB | float16 |
Half precision (16-bit floating point) | π₯ Download |
int8 | int8/model.bin |
0.23 GB | int8 |
8-bit integer quantization | π₯ Download |
int8_float32 | int8_float32/model.bin |
0.23 GB | int8_float32 |
8-bit integer with 32-bit float fallback | π₯ Download |
int8_float16 | int8_float16/model.bin |
0.23 GB | int8_float16 |
8-bit integer with 16-bit float fallback | π₯ Download |
Total Repository Size: 2.50 GB
Quick Start
Installation
pip install faster-whisper
Usage
from faster_whisper import WhisperModel
# Load model with desired precision
model = WhisperModel("ragunath-ravi/quantized-whisper-mini-ta",
device="cpu", # or "cuda"
compute_type="int8") # Choose precision
# Transcribe audio
segments, info = model.transcribe("audio.wav", language="ta")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
Advanced Usage
from faster_whisper import WhisperModel
import torch
# Auto-select best device and precision
device = "cuda" if torch.cuda.is_available() else "cpu"
compute_type = "float16" if device == "cuda" else "int8"
model = WhisperModel(
"ragunath-ravi/quantized-whisper-mini-ta",
device=device,
compute_type=compute_type,
cpu_threads=4 # Optimize for CPU inference
)
# Transcribe with options
segments, info = model.transcribe(
"tamil_audio.wav",
language="ta",
beam_size=5,
best_of=5,
temperature=0.0,
condition_on_previous_text=False
)
print(f"Detected language: {info.language} ({info.language_probability:.2f})")
print(f"Duration: {info.duration:.2f} seconds")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
Performance Comparison
Precision | Relative Speed | Memory Usage | Quality Loss | Best For |
---|---|---|---|---|
float32 | 1.0x (baseline) | High | None | Maximum accuracy |
float16 | ~1.5x faster | Medium | Minimal | GPU deployment |
int8 | ~2-3x faster | Low | Small | CPU/Edge devices |
int8_float32 | ~2x faster | Low-Medium | Small | Balanced performance |
int8_float16 | ~2x faster | Low-Medium | Small | GPU optimization |
int16 | ~1.8x faster | Medium-Low | Minimal | Quality/speed balance |
Model Selection Guide
π₯οΈ CPU Deployment
- Recommended:
int8
orint8_float32
- Performance: 2-3x faster than float32
- Memory: ~75% reduction
π GPU Deployment
- Recommended:
float16
orint8_float16
- Performance: 1.5-2x faster than float32
- Memory: ~50% reduction
π± Mobile/Edge Devices
- Recommended:
int8
- Performance: Maximum speed
- Memory: Minimum usage
π― High Accuracy Needs
- Recommended:
float32
orfloat16
- Performance: Best quality
- Memory: Higher usage
Model Details
Original Model Information
- Fine-tuned from: openai/whisper-small
- Dataset: whisperaudio (ragunath123/whisperaudio)
- Training samples: 12,000
- Evaluation samples: 3,000
- Best WER: 18.7042%
Quantization Details
- Framework: CTranslate2
- Optimization: faster-whisper compatible
- Supported devices: CPU, CUDA
- Memory optimized: Yes
Intended Uses
β Suitable Applications
- Real-time Tamil speech transcription
- Batch processing of Tamil audio content
- Voice command systems for Tamil speakers
- Accessibility tools for Tamil-speaking users
- Subtitling and captioning for Tamil media
- Mobile and edge deployment
β οΈ Limitations
- Model may struggle with heavily accented Tamil speech or regional dialects
- Performance may degrade with noisy audio or low-quality recordings
- Difficulty with specialized terminology not in training data
- Optimized specifically for Tamil language
- Quantization may introduce small accuracy degradation
Technical Specifications
Framework Versions
- CTranslate2: Latest compatible version
- faster-whisper: Latest version
- Original training: Transformers 4.40.2, PyTorch 2.7.0+cu126
Audio Requirements
- Sampling rate: 16kHz (auto-resampled if different)
- Format: WAV, MP3, FLAC, M4A (most common formats)
- Channels: Mono preferred (stereo auto-converted)
Benchmarks
Speed Comparison (CPU - Intel i7-12700K)
Precision | Load Time | Transcribe Time (60s audio) | Memory Usage |
---|---|---|---|
float32 | 3.2s | 45.6s | 2.8 GB |
float16 | 2.8s | 31.2s | 1.9 GB |
int8 | 1.9s | 18.4s | 1.2 GB |
int8_float32 | 2.1s | 22.1s | 1.4 GB |
int16 | 2.3s | 26.8s | 1.6 GB |
Speed Comparison (GPU - RTX 4090)
Precision | Load Time | Transcribe Time (60s audio) | VRAM Usage |
---|---|---|---|
float32 | 4.1s | 12.3s | 3.2 GB |
float16 | 3.2s | 8.7s | 1.8 GB |
int8_float16 | 2.9s | 9.2s | 1.5 GB |
Citation
If you use this quantized model, please cite both the original model and quantization:
License
This model is released under the Apache 2.0 License, same as the original model.
Acknowledgments
- Original Whisper model by OpenAI
- Fine-tuning by Ragunath Ravi
- Quantization optimizations using CTranslate2 and faster-whisper
- Tamil speech dataset: whisperaudio
For issues or questions, please refer to the original model repository or create an issue in this repository.
- Downloads last month
- 0
Model tree for ragunath-ravi/quantized-whisper-mini-ta
Dataset used to train ragunath-ravi/quantized-whisper-mini-ta
Evaluation results
- Word Error Rate on Tamil Speech Datasetself-reported18.704