Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published 29 days ago • 7 • 2
Liger: Linearizing Large Language Models to Gated Recurrent Structures Paper • 2503.01496 • Published Mar 3 • 16 • 2
MoM: Linear Sequence Modeling with Mixture-of-Memories Paper • 2502.13685 • Published Feb 19 • 33 • 2
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid Paper • 2502.07563 • Published Feb 11 • 24 • 2