24 20 30

Loser Cheems

JingzeShi

https://github.com/LoserCheems

LoserCheems

AI & ML interests

I like training small languge models.

Recent Activity

updated a dataset 2 days ago

SmallDoge/niah

published a dataset 2 days ago

SmallDoge/niah

upvoted a paper 23 days ago

AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

View all activity

Organizations

Posts 7

Post

4095

Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗

Trainable Dynamic Mask Sparse Attention (2508.02124)

View all Posts

Articles 1

Article

Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models

View all Articles

Collections 1

Papers 4

spaces 1

Test

🥇

View and submit LLM evaluations

models 7

datasets 19

JingzeShi/table-sample

Viewer • Updated Dec 17, 2025 • 2 • 9

JingzeShi/OpenSeek-Pretrain-100B

Viewer • Updated Jul 26, 2025 • 56.1M • 54

JingzeShi/minervamath

Viewer • Updated May 15, 2025 • 272 • 37

JingzeShi/amc23

Viewer • Updated May 15, 2025 • 40 • 9

JingzeShi/gpqa

Viewer • Updated May 15, 2025 • 198 • 206

JingzeShi/math_500

Viewer • Updated May 15, 2025 • 500 • 9

JingzeShi/aime25

Viewer • Updated May 15, 2025 • 30 • 6

JingzeShi/aime24

Viewer • Updated May 15, 2025 • 30 • 12

JingzeShi/aime24-try-run

Viewer • Updated May 15, 2025 • 2 • 23

JingzeShi/test_dataset

Viewer • Updated Apr 30, 2025 • 1 • 29

View 19 datasets

Loser Cheems

AI & ML interests

Recent Activity

Organizations

Posts 7

Articles 1

Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models

Collections 1

SmallDoge/Doge-320M-Instruct

SmallDoge/Doge-160M-Instruct

SmallDoge/Doge-60M-Instruct

SmallDoge/Doge-20M-Instruct

SmallDoge/Doge-320M-Instruct

SmallDoge/Doge-160M-Instruct

SmallDoge/Doge-60M-Instruct

SmallDoge/Doge-20M-Instruct

Papers 4

spaces 1

Test

models 7

JingzeShi/OpenSeek-1.4B-A0.4B-KTO

JingzeShi/OpenSeek-1.4B-A0.4B

JingzeShi/Doge-20M

JingzeShi/Doge-320M-Reason-checkpoint

JingzeShi/Doge-320M-Reason-Distill

JingzeShi/Doge-120M-MoE

JingzeShi/Mixtral-7B-v0.1

datasets 19

JingzeShi/table-sample

JingzeShi/OpenSeek-Pretrain-100B

JingzeShi/minervamath

JingzeShi/amc23

JingzeShi/gpqa

JingzeShi/math_500

JingzeShi/aime25

JingzeShi/aime24

JingzeShi/aime24-try-run

JingzeShi/test_dataset

Loser Cheems

AI & ML interests

Recent Activity

Organizations

Posts 7

Articles 1

Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models

Collections 1

Papers 4

spaces 1

Test

models 7 Sort: Recently updated

datasets 19 Sort: Recently updated

models 7

datasets 19