nvidia
/

Frame_VAD_Multilingual_MarbleNet_v2.0

Voice Activity Detection

Model card Files Files and versions Community

naymaraq commited on May 10

Commit

9e8c199

·

verified ·

1 Parent(s): b2804d8

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -16,25 +16,25 @@ Deployment Geography: Global <br>
 Use Case: Developers, speech processing engineers, and AI researchers will use it as the first step for other speech processing models. <br>
-## Reference
 [1] Jia, Fei, Somshubra Majumdar, and Boris Ginsburg. "MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.  <br>
 [2] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 <br>
-## Model Architecture
 **Architecture Type:**  Convolutional Neural Network (CNN) <br>
 **Network Architecture:** MarbleNet <br>
 **This model has 91.5K of model parameters** <br>
-### Input
 **Input Type(s):** Audio <br>
 **Input Format:** .wav files <br>
 **Input Parameters:** 1D <br>
 **Other Properties Related to Input:** 16000 Hz Mono-channel Audio, Pre-Processing Not Needed <br>
-### Output:
 **Output Type(s):** Sequence of speech probabilities for each 20 millisecond frame <br>
 **Output Format:** Float Array <br>
 **Output Parameters:** 1D <br>
@@ -52,7 +52,6 @@ TODO
 **Runtime Engine(s):**
 * NeMo-2.0.0 <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * [NVIDIA Ampere] <br>
 * [NVIDIA Blackwell] <br>

 Use Case: Developers, speech processing engineers, and AI researchers will use it as the first step for other speech processing models. <br>
+## References:
 [1] Jia, Fei, Somshubra Majumdar, and Boris Ginsburg. "MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.  <br>
 [2] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 <br>
+## Model Architecture:
 **Architecture Type:**  Convolutional Neural Network (CNN) <br>
 **Network Architecture:** MarbleNet <br>
 **This model has 91.5K of model parameters** <br>
+## Input: <br>
 **Input Type(s):** Audio <br>
 **Input Format:** .wav files <br>
 **Input Parameters:** 1D <br>
 **Other Properties Related to Input:** 16000 Hz Mono-channel Audio, Pre-Processing Not Needed <br>
+## Output: <br>
 **Output Type(s):** Sequence of speech probabilities for each 20 millisecond frame <br>
 **Output Format:** Float Array <br>
 **Output Parameters:** 1D <br>
 **Runtime Engine(s):**
 * NeMo-2.0.0 <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * [NVIDIA Ampere] <br>
 * [NVIDIA Blackwell] <br>