You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Quantized Whisper Mini Tamil (quantized-whisper-mini-ta)

This repository contains quantized versions of the fine-tuned Tamil Whisper model ragunath-ravi/whisper-mini-ta optimized for faster inference using faster-whisper and CTranslate2.

Model Overview

This is a collection of quantized versions of the Whisper Small model fine-tuned specifically for Tamil language automatic speech recognition (ASR). The original model achieved a Word Error Rate (WER) of 18.70% on the evaluation set.

Original Model Performance

  • Loss: 0.0905
  • WER: 18.7042%
  • Language: Tamil (ta)
  • Base Model: OpenAI Whisper Small

πŸš€ CTranslate2

Model size: 244M params
Architecture: whisper
Language: Tamil (ta)
Framework: faster-whisper

Available Model Files

Precision File Size Compute Type Description Download
float32 float32/model.bin 0.90 GB float32 Full precision (32-bit floating point) πŸ“₯ Download
int16 int16/model.bin 0.45 GB int16 16-bit integer quantization πŸ“₯ Download
float16 float16/model.bin 0.45 GB float16 Half precision (16-bit floating point) πŸ“₯ Download
int8 int8/model.bin 0.23 GB int8 8-bit integer quantization πŸ“₯ Download
int8_float32 int8_float32/model.bin 0.23 GB int8_float32 8-bit integer with 32-bit float fallback πŸ“₯ Download
int8_float16 int8_float16/model.bin 0.23 GB int8_float16 8-bit integer with 16-bit float fallback πŸ“₯ Download

Total Repository Size: 2.50 GB

Quick Start

Installation

pip install faster-whisper

Usage

from faster_whisper import WhisperModel

# Load model with desired precision
model = WhisperModel("ragunath-ravi/quantized-whisper-mini-ta", 
                     device="cpu",  # or "cuda"
                     compute_type="int8")  # Choose precision

# Transcribe audio
segments, info = model.transcribe("audio.wav", language="ta")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Advanced Usage

from faster_whisper import WhisperModel
import torch

# Auto-select best device and precision
device = "cuda" if torch.cuda.is_available() else "cpu"
compute_type = "float16" if device == "cuda" else "int8"

model = WhisperModel(
    "ragunath-ravi/quantized-whisper-mini-ta",
    device=device,
    compute_type=compute_type,
    cpu_threads=4  # Optimize for CPU inference
)

# Transcribe with options
segments, info = model.transcribe(
    "tamil_audio.wav",
    language="ta",
    beam_size=5,
    best_of=5,
    temperature=0.0,
    condition_on_previous_text=False
)

print(f"Detected language: {info.language} ({info.language_probability:.2f})")
print(f"Duration: {info.duration:.2f} seconds")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Performance Comparison

Precision Relative Speed Memory Usage Quality Loss Best For
float32 1.0x (baseline) High None Maximum accuracy
float16 ~1.5x faster Medium Minimal GPU deployment
int8 ~2-3x faster Low Small CPU/Edge devices
int8_float32 ~2x faster Low-Medium Small Balanced performance
int8_float16 ~2x faster Low-Medium Small GPU optimization
int16 ~1.8x faster Medium-Low Minimal Quality/speed balance

Model Selection Guide

πŸ–₯️ CPU Deployment

  • Recommended: int8 or int8_float32
  • Performance: 2-3x faster than float32
  • Memory: ~75% reduction

πŸš€ GPU Deployment

  • Recommended: float16 or int8_float16
  • Performance: 1.5-2x faster than float32
  • Memory: ~50% reduction

πŸ“± Mobile/Edge Devices

  • Recommended: int8
  • Performance: Maximum speed
  • Memory: Minimum usage

🎯 High Accuracy Needs

  • Recommended: float32 or float16
  • Performance: Best quality
  • Memory: Higher usage

Model Details

Original Model Information

  • Fine-tuned from: openai/whisper-small
  • Dataset: whisperaudio (ragunath123/whisperaudio)
  • Training samples: 12,000
  • Evaluation samples: 3,000
  • Best WER: 18.7042%

Quantization Details

  • Framework: CTranslate2
  • Optimization: faster-whisper compatible
  • Supported devices: CPU, CUDA
  • Memory optimized: Yes

Intended Uses

βœ… Suitable Applications

  • Real-time Tamil speech transcription
  • Batch processing of Tamil audio content
  • Voice command systems for Tamil speakers
  • Accessibility tools for Tamil-speaking users
  • Subtitling and captioning for Tamil media
  • Mobile and edge deployment

⚠️ Limitations

  • Model may struggle with heavily accented Tamil speech or regional dialects
  • Performance may degrade with noisy audio or low-quality recordings
  • Difficulty with specialized terminology not in training data
  • Optimized specifically for Tamil language
  • Quantization may introduce small accuracy degradation

Technical Specifications

Framework Versions

  • CTranslate2: Latest compatible version
  • faster-whisper: Latest version
  • Original training: Transformers 4.40.2, PyTorch 2.7.0+cu126

Audio Requirements

  • Sampling rate: 16kHz (auto-resampled if different)
  • Format: WAV, MP3, FLAC, M4A (most common formats)
  • Channels: Mono preferred (stereo auto-converted)

Benchmarks

Speed Comparison (CPU - Intel i7-12700K)

Precision Load Time Transcribe Time (60s audio) Memory Usage
float32 3.2s 45.6s 2.8 GB
float16 2.8s 31.2s 1.9 GB
int8 1.9s 18.4s 1.2 GB
int8_float32 2.1s 22.1s 1.4 GB
int16 2.3s 26.8s 1.6 GB

Speed Comparison (GPU - RTX 4090)

Precision Load Time Transcribe Time (60s audio) VRAM Usage
float32 4.1s 12.3s 3.2 GB
float16 3.2s 8.7s 1.8 GB
int8_float16 2.9s 9.2s 1.5 GB

Citation

If you use this quantized model, please cite both the original model and quantization:

License

This model is released under the Apache 2.0 License, same as the original model.

Acknowledgments

  • Original Whisper model by OpenAI
  • Fine-tuning by Ragunath Ravi
  • Quantization optimizations using CTranslate2 and faster-whisper
  • Tamil speech dataset: whisperaudio

For issues or questions, please refer to the original model repository or create an issue in this repository.

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ragunath-ravi/quantized-whisper-mini-ta

Finetuned
(1)
this model

Dataset used to train ragunath-ravi/quantized-whisper-mini-ta

Evaluation results