MoM
Collection
9 items
•
Updated
•
2
Model of the paper MoM: Linear Sequence Modeling with Mixture-of-Memories and Gated Delta Networks: Improving Mamba2 with Delta Rule.
The model was trained on a sample of SlimPajama with 100B tokens.