nvidia
/

ssl_en_nest_xlarge_v1.0

Self-supervised Learning

Model card Files Files and versions

steveheh commited on Dec 11, 2024

Commit

a5c094a

·

verified ·

1 Parent(s): 4e1dab1

Update README.md

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -13,6 +13,16 @@ tags:
 # NVIDIA NEST XLarge En
 The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-XL model has about 600M parameters and is trained on an English dataset of roughly 100K hours.  <br>
 This model is ready for commercial/non-commercial use.  <br>
@@ -29,7 +39,11 @@ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.
 ## Model Architecture
-**Architecture Type:** NEST [1]  <br>
 **Network Architecture:**
 - Encoder: FastConformer (24 layers)

 # NVIDIA NEST XLarge En
+<style>
+img {
+ display: inline;
+}
+</style>
+[![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer-lightgrey#model-badge)](#model-architecture)
+| [![Model size](https://img.shields.io/badge/Params-600M-lightgrey#model-badge)](#model-architecture)
 The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-XL model has about 600M parameters and is trained on an English dataset of roughly 100K hours.  <br>
 This model is ready for commercial/non-commercial use.  <br>
 ## Model Architecture
+The [NEST](https://arxiv.org/abs/2408.13106) framework comprises several building blocks, as illustrated in the left part of the following figure. Once trained, the NEST encoder can be used as weight initialization or feature extractor for downstream speech processing tasks.
+<div align="center">
+    <img src="nest-model.png" width="1000" />
+</div>
 **Network Architecture:**
 - Encoder: FastConformer (24 layers)