HumAware-VAD / README.md
CuriousMonkey7's picture
Update README.md
97a86a0 verified
metadata
license: mit
datasets:
  - CuriousMonkey7/HumSpeechBlend
language:
  - en
base_model:
  - freddyaboulton/silero-vad
pipeline_tag: voice-activity-detection
tags:
  - vad
  - speech
  - audio
  - voice_activity_detection
  - silero-vad

HumAware-VAD: Humming-Aware Voice Activity Detection

πŸ“Œ Overview

HumAware-VAD is a fine-tuned version of the Silero-VAD model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (HumSpeechBlend) to enhance speech detection accuracy in the presence of humming.

🎯 Purpose

The primary goal of HumAware-VAD is to:

  • Reduce false positives where humming is mistakenly detected as speech.
  • Enhance speech segmentation accuracy in real-world applications.
  • Improve VAD performance for tasks involving music, background noise, and vocal sounds.

πŸ—‚οΈ Model Details

  • Base Model: Silero-VAD
  • Fine-tuning Dataset: HumSpeechBlend
  • Format: JIT (TorchScript)
  • Framework: PyTorch
  • Inference Speed: Real-time

πŸ“₯ Download & Usage

πŸ”Ή Install Dependencies

pip install torch torchaudio

πŸ”Ή Load the Model

import torch

def load_humaware_vad(model_path="humaware_vad.jit"):
    model = torch.jit.load(model_path)
    model.eval()
    return model

vad_model = load_humaware_vad()

πŸ”Ή Run Inference

import torchaudio

waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)

πŸ“„ Citation

If you use this model, please cite it accordingly.

@model{HumAwareVAD2025,
  author = {Sourabh Saini},
  title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}