Audio-to-Audio
Transformers
Safetensors
speech_language_model
gallilmaimon commited on
Commit
a10a98b
·
verified ·
1 Parent(s): b003bad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -33,13 +33,13 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
33
 
34
  ### Model Sources
35
 
36
- - **Repository:** [https://github.com/slp-rl/slam](https://github.com/slp-rl/slam)
37
  - **Paper:** [Soon!]
38
  - **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
39
 
40
  ## Uses
41
- This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _slam_
42
- [codebase](https://github.com/slp-rl/slam) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
43
 
44
  ### Out-of-Scope Use
45
  This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
@@ -47,7 +47,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
47
 
48
 
49
  ## How to Get Started with the Model
50
- We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slam).
51
 
52
 
53
  ## Training Details
@@ -62,12 +62,12 @@ dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
62
 
63
  ### Training Procedure
64
  This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
65
- Please refer to the [paper]() or [code](https://github.com/slp-rl/slam) for the full training recipes.
66
 
67
  #### Preprocessing
68
  Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
69
  official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
70
- We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/slam).
71
 
72
 
73
  ## Evaluation
@@ -92,7 +92,7 @@ This model was trained as part of ["*Slamming*: Training a Speech Language Model
92
  This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
93
 
94
  #### Software
95
- The model was trained using the [*Slam*](https://github.com/slp-rl/slam) codebase which builds upon 🤗transformers extending it to support
96
  easy and efficent training of Speech Language Models.
97
 
98
  ## Citation
 
33
 
34
  ### Model Sources
35
 
36
+ - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
37
  - **Paper:** [Soon!]
38
  - **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
39
 
40
  ## Uses
41
+ This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _SlamKit_
42
+ [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
43
 
44
  ### Out-of-Scope Use
45
  This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
 
47
 
48
 
49
  ## How to Get Started with the Model
50
+ We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slamkit).
51
 
52
 
53
  ## Training Details
 
62
 
63
  ### Training Procedure
64
  This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
65
+ Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
66
 
67
  #### Preprocessing
68
  Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
69
  official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
70
+ We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/slamkit).
71
 
72
 
73
  ## Evaluation
 
92
  This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
93
 
94
  #### Software
95
+ The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
96
  easy and efficent training of Speech Language Models.
97
 
98
  ## Citation