Li Dong

unilm

AI & ML interests

Language Model Pre-Training

Recent Activity

authored a paper 30 days ago

Universal YOCO for Efficient Depth Scaling

upvoted a paper about 1 month ago

Universal YOCO for Efficient Depth Scaling

submitted a paper about 1 month ago

Universal YOCO for Efficient Depth Scaling

View all activity

Organizations

upvoted a paper about 1 month ago

Universal YOCO for Efficient Depth Scaling

Paper • 2604.01220 • Published about 1 month ago • 18

upvoted 2 papers about 2 months ago

On-Policy Context Distillation for Language Models

Paper • 2602.12275 • Published Feb 12 • 3

Online Experiential Learning for Language Models

Paper • 2603.16856 • Published Mar 17 • 59

upvoted an article 2 months ago

Article

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Jan 27

•

upvoted 2 papers 3 months ago

VIBEVOICE-ASR Technical Report

Paper • 2601.18184 • Published Jan 26 • 23

LLM-in-Sandbox Elicits General Agentic Intelligence

Paper • 2601.16206 • Published Jan 22 • 87

upvoted an article 3 months ago

Article

Differential Transformer V2

Jan 20

•

upvoted 2 papers 3 months ago

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Paper • 2601.08808 • Published Jan 13 • 39

Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts

Paper • 2510.23027 • Published Oct 27, 2025 • 2

upvoted 2 collections 5 months ago

VibeVoice Models

Collection

3 items • Updated Dec 6, 2025 • 6

GAD-Models

Collection

Model checkpoints of Black-Box On-Policy Distillation of Large Language Models • 5 items • Updated Nov 17, 2025 • 6

upvoted 9 papers 6 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 46

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published Oct 22, 2025 • 117

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Paper • 2510.19808 • Published Oct 22, 2025 • 30

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Paper • 2510.19779 • Published Oct 22, 2025 • 62

BitNet Distillation

Paper • 2510.13998 • Published Oct 15, 2025 • 60

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20, 2025 • 80

Li Dong

AI & ML interests

Recent Activity

Organizations

unilm's activity

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Differential Transformer V2