WORK IN PROGRESS

[WIP]HumAware-VAD: Humming-Aware Voice Activity Detection

📌 Overview

HumAware-VAD is a fine-tuned version of the Silero-VAD model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (HumSpeechBlend) to enhance speech detection accuracy in the presence of humming.

🎯 Purpose

The primary goal of HumAware-VAD is to:

Reduce false positives where humming is mistakenly detected as speech.
Enhance speech segmentation accuracy in real-world applications.
Improve VAD performance for tasks involving music, background noise, and vocal sounds.

🗂️ Model Details

Base Model: Silero-VAD
Fine-tuning Dataset: HumSpeechBlend
Format: JIT (TorchScript)
Framework: PyTorch
Inference Speed: Real-time

📥 Download & Usage

🔹 Install Dependencies

pip install torch torchaudio

🔹 Load the Model

import torch

def load_humaware_vad(model_path="humaware_vad.jit"):
    model = torch.jit.load(model_path)
    model.eval()
    return model

vad_model = load_humaware_vad()

🔹 Run Inference

import torchaudio

waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)

📄 Citation

If you use this model, please cite it accordingly.

@model{HumAwareVAD2025,
  author = {Sourabh Saini},
  title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}

Downloads last month: 1,047

Inference Providers NEW

Voice Activity Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CuriousMonkey7/HumAware-VAD

Base model

freddyaboulton/silero-vad

Finetuned

(1)

this model

CuriousMonkey7
/

HumAware-VAD