Update README.md
Browse files
README.md
CHANGED
@@ -33,13 +33,13 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
|
|
33 |
|
34 |
### Model Sources
|
35 |
|
36 |
-
- **Repository:** [https://github.com/slp-rl/
|
37 |
- **Paper:** [Soon!]
|
38 |
- **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
39 |
|
40 |
## Uses
|
41 |
-
This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the
|
42 |
-
[codebase](https://github.com/slp-rl/
|
43 |
|
44 |
### Out-of-Scope Use
|
45 |
This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
|
@@ -47,7 +47,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
|
|
47 |
|
48 |
|
49 |
## How to Get Started with the Model
|
50 |
-
We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/
|
51 |
|
52 |
|
53 |
## Training Details
|
@@ -62,12 +62,12 @@ dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
|
62 |
|
63 |
### Training Procedure
|
64 |
This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
65 |
-
Please refer to the [paper]() or [code](https://github.com/slp-rl/
|
66 |
|
67 |
#### Preprocessing
|
68 |
Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
|
69 |
official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
|
70 |
-
We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/
|
71 |
|
72 |
|
73 |
## Evaluation
|
@@ -92,7 +92,7 @@ This model was trained as part of ["*Slamming*: Training a Speech Language Model
|
|
92 |
This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
|
93 |
|
94 |
#### Software
|
95 |
-
The model was trained using the [*
|
96 |
easy and efficent training of Speech Language Models.
|
97 |
|
98 |
## Citation
|
|
|
33 |
|
34 |
### Model Sources
|
35 |
|
36 |
+
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
37 |
- **Paper:** [Soon!]
|
38 |
- **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
39 |
|
40 |
## Uses
|
41 |
+
This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _SlamKit_
|
42 |
+
[codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
|
43 |
|
44 |
### Out-of-Scope Use
|
45 |
This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
|
|
|
47 |
|
48 |
|
49 |
## How to Get Started with the Model
|
50 |
+
We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slamkit).
|
51 |
|
52 |
|
53 |
## Training Details
|
|
|
62 |
|
63 |
### Training Procedure
|
64 |
This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
65 |
+
Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
|
66 |
|
67 |
#### Preprocessing
|
68 |
Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
|
69 |
official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
|
70 |
+
We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/slamkit).
|
71 |
|
72 |
|
73 |
## Evaluation
|
|
|
92 |
This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
|
93 |
|
94 |
#### Software
|
95 |
+
The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
|
96 |
easy and efficent training of Speech Language Models.
|
97 |
|
98 |
## Citation
|