steveheh commited on
Commit
a5c094a
·
verified ·
1 Parent(s): 4e1dab1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -13,6 +13,16 @@ tags:
13
 
14
  # NVIDIA NEST XLarge En
15
 
 
 
 
 
 
 
 
 
 
 
16
  The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-XL model has about 600M parameters and is trained on an English dataset of roughly 100K hours. <br>
17
  This model is ready for commercial/non-commercial use. <br>
18
 
@@ -29,7 +39,11 @@ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.
29
 
30
  ## Model Architecture
31
 
32
- **Architecture Type:** NEST [1] <br>
 
 
 
 
33
 
34
  **Network Architecture:**
35
  - Encoder: FastConformer (24 layers)
 
13
 
14
  # NVIDIA NEST XLarge En
15
 
16
+ <style>
17
+ img {
18
+ display: inline;
19
+ }
20
+ </style>
21
+
22
+ [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer-lightgrey#model-badge)](#model-architecture)
23
+ | [![Model size](https://img.shields.io/badge/Params-600M-lightgrey#model-badge)](#model-architecture)
24
+
25
+
26
  The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-XL model has about 600M parameters and is trained on an English dataset of roughly 100K hours. <br>
27
  This model is ready for commercial/non-commercial use. <br>
28
 
 
39
 
40
  ## Model Architecture
41
 
42
+ The [NEST](https://arxiv.org/abs/2408.13106) framework comprises several building blocks, as illustrated in the left part of the following figure. Once trained, the NEST encoder can be used as weight initialization or feature extractor for downstream speech processing tasks.
43
+
44
+ <div align="center">
45
+ <img src="nest-model.png" width="1000" />
46
+ </div>
47
 
48
  **Network Architecture:**
49
  - Encoder: FastConformer (24 layers)