Safetensors
English
gsa
JusenK commited on
Commit
e87d6f3
·
verified ·
1 Parent(s): 38a80fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -8,4 +8,6 @@ language:
8
 
9
  Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated Slot Attention for Efficient Linear-Time Sequence Modeling](https://arxiv.org/abs/2409.07146).
10
 
11
- The model was trained on a sample of SlimPajama with 15B tokens.
 
 
 
8
 
9
  Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated Slot Attention for Efficient Linear-Time Sequence Modeling](https://arxiv.org/abs/2409.07146).
10
 
11
+ The model was trained on a sample of SlimPajama with 15B tokens.
12
+
13
+ Due to changes in the MLP layer structure in the latest version of fla, the weights cannot be loaded. You can use the version at [fla](https://github.com/fla-org/flash-linear-attention/tree/8346a33792558d8e3eb206fe18404de037e11d9c) instead.