Safetensors
English
mom_gated_deltanet
JusenK's picture
Update README.md
5c8be94 verified
metadata
license: apache-2.0
datasets:
  - cerebras/SlimPajama-627B
language:
  - en

Model of the paper MoM: Linear Sequence Modeling with Mixture-of-Memories.

The model was trained on a sample of SlimPajama with 100B tokens. We use Gated-Deltanet as the memory update mechanism.