slprl
/

slam

@@ -10,14 +10,15 @@ base_model:
 pipeline_tag: audio-to-audio
 ---
-# Model Card for Model ID
-This is a Speech Lanaguage Model trained for generating speech contiuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
 ## Model Details
 ### Model Description
-This is a Speech Lanaguage Model, introduced in "[_Slamming_: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814)", focusing on efficient training.
 It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
 the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
 slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
@@ -35,10 +36,10 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
 - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
 - **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
-- **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
 ## Uses
-This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _SlamKit_
 [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
 ### Out-of-Scope Use
@@ -47,7 +48,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
 ## How to Get Started with the Model
-We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slamkit).
 ## Training Details
@@ -61,7 +62,7 @@ This model was trained on a subset of [LibriSpeech](https://huggingface.co/datas
 dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
 ### Training Procedure
-This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
 Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
 #### Preprocessing
@@ -92,8 +93,8 @@ This model was trained as part of ["*Slamming*: Training a Speech Language Model
 This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
 #### Software
-The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
-easy and efficent training of Speech Language Models.
 ## Citation

 pipeline_tag: audio-to-audio
 ---
+# Model Card for SLAM
+This is a Speech Language Model trained for generating speech continuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
 ## Model Details
 ### Model Description
+This is a Speech Language Model, introduced in "[_Slamming_: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814)", focusing on efficient training.
 It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
 the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
 slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
 - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
 - **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
+- **Demo** [https://pages.cs.huji.ac.il/adiyoss-lab/slamming/](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
 ## Uses
+This is a base SpeechLM and as such can be used to generate continuations for speech segments, or as base for further tuning. See the _SlamKit_
 [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
 ### Out-of-Scope Use
 ## How to Get Started with the Model
+We refer users to the official repository for full usage explanations - [github](https://github.com/slp-rl/slamkit).
 ## Training Details
 dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
 ### Training Procedure
+This model was trained by next token prediction over several datasets, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
 Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
 #### Preprocessing
 This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
 #### Software
+The model wastrained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
+easy and efficient training of Speech Language Models.
 ## Citation