3 12 11

Weigao Sun

weigao266

AI & ML interests

Algo & MLSys

Recent Activity

upvoted a paper 5 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

authored a paper 5 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

upvoted a collection 18 days ago

MoM

View all activity

Organizations

weigao266's activity

upvoted a paper 5 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 9 days ago • 36

authored a paper 5 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 9 days ago • 36

upvoted a collection 18 days ago

MoM

Collection

9 items • Updated 18 days ago • 2

updated 2 collections 18 days ago

Liger

Collection

6 items • Updated 16 days ago • 1

MoM

Collection

9 items • Updated 18 days ago • 2

upvoted a paper 26 days ago

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

Paper • 2503.05447 • Published 29 days ago • 7

commented a paper 26 days ago

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

Paper • 2503.05447 • Published 29 days ago • 7 •

upvoted a paper about 1 month ago

Liger: Linearizing Large Language Models to Gated Recurrent Structures

Paper • 2503.01496 • Published Mar 3 • 16

commented a paper about 1 month ago

Liger: Linearizing Large Language Models to Gated Recurrent Structures

Paper • 2503.01496 • Published Mar 3 • 16 •

upvoted a paper about 1 month ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19 • 33

commented a paper about 1 month ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19 • 33 •

authored 6 papers about 2 months ago

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Paper • 2401.16265 • Published Jan 29, 2024 • 1

Linear Attention Sequence Parallelism

Paper • 2404.02882 • Published Apr 3, 2024 • 3

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Paper • 2405.17381 • Published May 27, 2024

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Paper • 2411.15708 • Published Nov 24, 2024

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 283

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published Feb 11 • 24