Audio-to-Audio
Transformers
Safetensors
speech_language_model
Inference Endpoints
gallilmaimon commited on
Commit
0f9b7c2
·
verified ·
1 Parent(s): a10a98b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -5
README.md CHANGED
@@ -17,7 +17,7 @@ This is a Speech Lanaguage Model trained for generating speech contiuations over
17
  ## Model Details
18
 
19
  ### Model Description
20
- This is a Speech Lanaguage Model, introduced in "_Slamming_: Training a Speech Language Model on One GPU in a Day", focusing on efficient training.
21
  It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
22
  the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
23
  slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
@@ -34,7 +34,7 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
34
  ### Model Sources
35
 
36
  - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
37
- - **Paper:** [Soon!]
38
  - **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
39
 
40
  ## Uses
@@ -51,7 +51,7 @@ We refer users to the official repository for full usage explainations - [github
51
 
52
 
53
  ## Training Details
54
- We highly encourage users to read the full [paper](), for full training details, a brief overview is provided below.
55
 
56
 
57
  ### Training Data
@@ -86,7 +86,7 @@ The paper provides full results, we do give here some results and also refer to
86
 
87
 
88
  ### Compute Infrastructure
89
- This model was trained as part of ["*Slamming*: Training a Speech Language Model on One GPU in a Day"], focusing on efficient training.
90
 
91
  #### Hardware
92
  This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
@@ -98,4 +98,14 @@ easy and efficent training of Speech Language Models.
98
  ## Citation
99
 
100
  **BibTeX:**
101
- Soon!
 
 
 
 
 
 
 
 
 
 
 
17
  ## Model Details
18
 
19
  ### Model Description
20
+ This is a Speech Lanaguage Model, introduced in "[_Slamming_: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814)", focusing on efficient training.
21
  It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
22
  the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
23
  slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
 
34
  ### Model Sources
35
 
36
  - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
37
+ - **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
38
  - **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
39
 
40
  ## Uses
 
51
 
52
 
53
  ## Training Details
54
+ We highly encourage users to read the full [paper](https://arxiv.org/abs/2502.15814), for full training details, a brief overview is provided below.
55
 
56
 
57
  ### Training Data
 
86
 
87
 
88
  ### Compute Infrastructure
89
+ This model was trained as part of ["*Slamming*: Training a Speech Language Model on One GPU in a Day"](https://arxiv.org/abs/2502.15814), focusing on efficient training.
90
 
91
  #### Hardware
92
  This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
 
98
  ## Citation
99
 
100
  **BibTeX:**
101
+ ```
102
+ @misc{maimon2025slamming,
103
+ title={Slamming: Training a Speech Language Model on One GPU in a Day},
104
+ author={Gallil Maimon and Avishai Elmakies and Yossi Adi},
105
+ year={2025},
106
+ eprint={2502.15814},
107
+ archivePrefix={arXiv},
108
+ primaryClass={cs.LG},
109
+ url={https://arxiv.org/abs/2502.15814},
110
+ }
111
+ ```