Quantized Whisper Mini Tamil (quantized-whisper-mini-ta)

This repository contains quantized versions of the fine-tuned Tamil Whisper model ragunath-ravi/whisper-mini-ta optimized for faster inference using faster-whisper and CTranslate2.

Model Overview

This is a collection of quantized versions of the Whisper Small model fine-tuned specifically for Tamil language automatic speech recognition (ASR). The original model achieved a Word Error Rate (WER) of 18.70% on the evaluation set.

Original Model Performance

Loss: 0.0905
WER: 18.7042%
Language: Tamil (ta)
Base Model: OpenAI Whisper Small

🚀 CTranslate2

Model size: 244M params
Architecture: whisper
Language: Tamil (ta)
Framework: faster-whisper

Available Model Files

Precision	File	Size	Compute Type	Description	Download
float32	`float32/model.bin`	0.90 GB	`float32`	Full precision (32-bit floating point)	📥 Download
int16	`int16/model.bin`	0.45 GB	`int16`	16-bit integer quantization	📥 Download
float16	`float16/model.bin`	0.45 GB	`float16`	Half precision (16-bit floating point)	📥 Download
int8	`int8/model.bin`	0.23 GB	`int8`	8-bit integer quantization	📥 Download
int8_float32	`int8_float32/model.bin`	0.23 GB	`int8_float32`	8-bit integer with 32-bit float fallback	📥 Download
int8_float16	`int8_float16/model.bin`	0.23 GB	`int8_float16`	8-bit integer with 16-bit float fallback	📥 Download

Total Repository Size: 2.50 GB

Quick Start

Installation

pip install faster-whisper

Usage

from faster_whisper import WhisperModel

# Load model with desired precision
model = WhisperModel("ragunath-ravi/quantized-whisper-mini-ta", 
                     device="cpu",  # or "cuda"
                     compute_type="int8")  # Choose precision

# Transcribe audio
segments, info = model.transcribe("audio.wav", language="ta")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Advanced Usage

from faster_whisper import WhisperModel
import torch

# Auto-select best device and precision
device = "cuda" if torch.cuda.is_available() else "cpu"
compute_type = "float16" if device == "cuda" else "int8"

model = WhisperModel(
    "ragunath-ravi/quantized-whisper-mini-ta",
    device=device,
    compute_type=compute_type,
    cpu_threads=4  # Optimize for CPU inference
)

# Transcribe with options
segments, info = model.transcribe(
    "tamil_audio.wav",
    language="ta",
    beam_size=5,
    best_of=5,
    temperature=0.0,
    condition_on_previous_text=False
)

print(f"Detected language: {info.language} ({info.language_probability:.2f})")
print(f"Duration: {info.duration:.2f} seconds")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Performance Comparison

Precision	Relative Speed	Memory Usage	Quality Loss	Best For
float32	1.0x (baseline)	High	None	Maximum accuracy
float16	~1.5x faster	Medium	Minimal	GPU deployment
int8	~2-3x faster	Low	Small	CPU/Edge devices
int8_float32	~2x faster	Low-Medium	Small	Balanced performance
int8_float16	~2x faster	Low-Medium	Small	GPU optimization
int16	~1.8x faster	Medium-Low	Minimal	Quality/speed balance

Model Selection Guide

🖥️ CPU Deployment

Recommended: int8 or int8_float32
Performance: 2-3x faster than float32
Memory: ~75% reduction

🚀 GPU Deployment

Recommended: float16 or int8_float16
Performance: 1.5-2x faster than float32
Memory: ~50% reduction

📱 Mobile/Edge Devices

Recommended: int8
Performance: Maximum speed
Memory: Minimum usage

🎯 High Accuracy Needs

Recommended: float32 or float16
Performance: Best quality
Memory: Higher usage

Model Details

Original Model Information

Fine-tuned from: openai/whisper-small
Dataset: whisperaudio (ragunath123/whisperaudio)
Training samples: 12,000
Evaluation samples: 3,000
Best WER: 18.7042%

Quantization Details

Framework: CTranslate2
Optimization: faster-whisper compatible
Supported devices: CPU, CUDA
Memory optimized: Yes

Intended Uses

✅ Suitable Applications

Real-time Tamil speech transcription
Batch processing of Tamil audio content
Voice command systems for Tamil speakers
Accessibility tools for Tamil-speaking users
Subtitling and captioning for Tamil media
Mobile and edge deployment

⚠️ Limitations

Model may struggle with heavily accented Tamil speech or regional dialects
Performance may degrade with noisy audio or low-quality recordings
Difficulty with specialized terminology not in training data
Optimized specifically for Tamil language
Quantization may introduce small accuracy degradation

Technical Specifications

Framework Versions

CTranslate2: Latest compatible version
faster-whisper: Latest version
Original training: Transformers 4.40.2, PyTorch 2.7.0+cu126

Audio Requirements

Sampling rate: 16kHz (auto-resampled if different)
Format: WAV, MP3, FLAC, M4A (most common formats)
Channels: Mono preferred (stereo auto-converted)

Benchmarks

Speed Comparison (CPU - Intel i7-12700K)

Precision	Load Time	Transcribe Time (60s audio)	Memory Usage
float32	3.2s	45.6s	2.8 GB
float16	2.8s	31.2s	1.9 GB
int8	1.9s	18.4s	1.2 GB
int8_float32	2.1s	22.1s	1.4 GB
int16	2.3s	26.8s	1.6 GB

Speed Comparison (GPU - RTX 4090)

Precision	Load Time	Transcribe Time (60s audio)	VRAM Usage
float32	4.1s	12.3s	3.2 GB
float16	3.2s	8.7s	1.8 GB
int8_float16	2.9s	9.2s	1.5 GB

Citation

If you use this quantized model, please cite both the original model and quantization:

License

This model is released under the Apache 2.0 License, same as the original model.

Acknowledgments

Original Whisper model by OpenAI
Fine-tuning by Ragunath Ravi
Quantization optimizations using CTranslate2 and faster-whisper
Tamil speech dataset: whisperaudio

For issues or questions, please refer to the original model repository or create an issue in this repository.

ragunath-ravi
/

quantized-whisper-mini-ta

You need to agree to share your contact information to access this model