HumAware-VAD: Humming-Aware Voice Activity Detection

πŸ“Œ Overview

HumAware-VAD is a fine-tuned version of the Silero-VAD model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (HumSpeechBlend) to enhance speech detection accuracy in the presence of humming.

🎯 Purpose

The primary goal of HumAware-VAD is to:

  • Reduce false positives where humming is mistakenly detected as speech.
  • Enhance speech segmentation accuracy in real-world applications.
  • Improve VAD performance for tasks involving music, background noise, and vocal sounds.

πŸ—‚οΈ Model Details

  • Base Model: Silero-VAD
  • Fine-tuning Dataset: HumSpeechBlend
  • Format: JIT (TorchScript)
  • Framework: PyTorch
  • Inference Speed: Real-time

πŸ“₯ Download & Usage

πŸ”Ή Install Dependencies

pip install torch torchaudio

πŸ”Ή Load the Model

import torch

def load_humaware_vad(model_path="humaware_vad.jit"):
    model = torch.jit.load(model_path)
    model.eval()
    return model

vad_model = load_humaware_vad()

πŸ”Ή Run Inference

import torchaudio

waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)

πŸ“„ Citation

If you use this model, please cite it accordingly.

@model{HumAwareVAD2025,
  author = {Sourabh Saini},
  title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}
Downloads last month
82
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for CuriousMonkey7/HumAware-VAD

Finetuned
(1)
this model

Dataset used to train CuriousMonkey7/HumAware-VAD