Qian Liu's picture

Qian Liu

SivilTaram

·

http://siviltaram.github.io/

AI & ML interests

Cooking cool things

Recent Activity

upvoted a paper 13 days ago

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

upvoted a paper 13 days ago

Efficient Agents: Building Effective Agents While Reducing Cost

upvoted a paper 13 days ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

View all activity

Organizations

upvoted 3 papers 13 days ago

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published 15 days ago • 61

Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published 27 days ago • 81

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published 14 days ago • 114

upvoted a paper 15 days ago

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published 17 days ago • 126

upvoted a paper 26 days ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published 28 days ago • 289

upvoted 3 papers about 1 month ago

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16 • 41

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Paper • 2505.23387 • Published May 29 • 9

First Return, Entropy-Eliciting Explore

Paper • 2507.07017 • Published Jul 9 • 23

upvoted an article about 1 month ago

Article

SmolLM3: smol, multilingual, long-context reasoner

By

and 22 others •

Jul 8

• 631

upvoted 3 papers about 1 month ago

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Paper • 2507.06229 • Published Jul 8 • 73

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 41

A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 88

upvoted 4 papers about 2 months ago

ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention

Paper • 2507.01004 • Published Jul 1 • 10

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 48

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25 • 46

MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 62

upvoted 2 collections 2 months ago

MiniMax-M1

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. • 6 items • Updated Jul 3 • 110

AceReason

Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated 6 days ago • 14

upvoted 2 papers 2 months ago

TaskCraft: Automated Generation of Agentic Tasks

Paper • 2506.10055 • Published Jun 11 • 32

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 261