Safetensors
English
transformer
JusenK commited on
Commit
98e627b
·
verified ·
1 Parent(s): 3b14c60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -8,4 +8,6 @@ language:
8
 
9
  Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685).
10
 
11
- The model was trained on a sample of SlimPajama with 15B tokens.
 
 
 
8
 
9
  Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685).
10
 
11
+ The model was trained on a sample of SlimPajama with 15B tokens.
12
+
13
+ Due to changes in the MLP layer structure in the latest version of fla, the weights cannot be loaded. You can use the version at [fla](https://github.com/fla-org/flash-linear-attention/tree/8346a33792558d8e3eb206fe18404de037e11d9c) instead.