metadata

license: mit
datasets:
  - CuriousMonkey7/HumSpeechBlend
language:
  - en
base_model:
  - freddyaboulton/silero-vad
pipeline_tag: voice-activity-detection
tags:
  - vad
  - speech
  - audio
  - voice_activity_detection
  - silero-vad

HumAware-VAD: Humming-Aware Voice Activity Detection

📌 Overview

HumAware-VAD is a fine-tuned version of the Silero-VAD model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (HumSpeechBlend) to enhance speech detection accuracy in the presence of humming.

🎯 Purpose

The primary goal of HumAware-VAD is to:

Reduce false positives where humming is mistakenly detected as speech.
Enhance speech segmentation accuracy in real-world applications.
Improve VAD performance for tasks involving music, background noise, and vocal sounds.

🗂️ Model Details

Base Model: Silero-VAD
Fine-tuning Dataset: HumSpeechBlend
Format: JIT (TorchScript)
Framework: PyTorch
Inference Speed: Real-time

📥 Download & Usage

🔹 Install Dependencies

pip install torch torchaudio

🔹 Load the Model

import torch

def load_humaware_vad(model_path="humaware_vad.jit"):
    model = torch.jit.load(model_path)
    model.eval()
    return model

vad_model = load_humaware_vad()

🔹 Run Inference

import torchaudio

waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)

📄 Citation

If you use this model, please cite it accordingly.

@model{HumAwareVAD2025,
  author = {Sourabh Saini},
  title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}