Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ tags:
|
|
36 |
</a>
|
37 |
</div>
|
38 |
|
39 |
-
Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by [SmallDoge](https://huggingface.co/SmallDoge) community, for detailed algorithm and model architecture,
|
40 |
|
41 |
|
42 |
## Uses
|
@@ -81,7 +81,7 @@ outputs = model.generate(
|
|
81 |
|
82 |
## Model Details
|
83 |
|
84 |
-
We build the Doge-Instruct by
|
85 |
|
86 |
**SFT**:
|
87 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
@@ -91,6 +91,7 @@ We build the Doge-Instruct by first SFT on [SmolTalk](https://huggingface.co/dat
|
|
91 |
| [Doge-160M-Instruct-SFT](https://huggingface.co/SmallDoge/Doge-160M-Instruct-SFT) | [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 4e-4 | 0.25M | bfloat16 |
|
92 |
| [Doge-320M-Instruct-SFT](https://huggingface.co/SmallDoge/Doge-320M-Instruct-SFT) | [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 2e-4 | 0.25M | bfloat16 |
|
93 |
|
|
|
94 |
**Procedure**:
|
95 |
|
96 |
**SFT**:
|
|
|
36 |
</a>
|
37 |
</div>
|
38 |
|
39 |
+
Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by [SmallDoge](https://huggingface.co/SmallDoge) community, for detailed algorithm and model architecture, paper coming soon, all training details and code are available in the [small-doge](https://github.com/SmallDoges/small-doge) repository.
|
40 |
|
41 |
|
42 |
## Uses
|
|
|
81 |
|
82 |
## Model Details
|
83 |
|
84 |
+
We build the Doge-Instruct-SFT by SFT on [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk).
|
85 |
|
86 |
**SFT**:
|
87 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
|
|
91 |
| [Doge-160M-Instruct-SFT](https://huggingface.co/SmallDoge/Doge-160M-Instruct-SFT) | [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 4e-4 | 0.25M | bfloat16 |
|
92 |
| [Doge-320M-Instruct-SFT](https://huggingface.co/SmallDoge/Doge-320M-Instruct-SFT) | [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 2e-4 | 0.25M | bfloat16 |
|
93 |
|
94 |
+
|
95 |
**Procedure**:
|
96 |
|
97 |
**SFT**:
|