Safetensors
English
transformer
Transformer-340M / README.md
JusenK's picture
Update README.md
98e627b verified
metadata
license: apache-2.0
datasets:
  - cerebras/SlimPajama-627B
language:
  - en

Model of the paper MoM: Linear Sequence Modeling with Mixture-of-Memories.

The model was trained on a sample of SlimPajama with 15B tokens.

Due to changes in the MLP layer structure in the latest version of fla, the weights cannot be loaded. You can use the version at fla instead.