4 23 1

zhu

xuekai

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

upvoted a paper 16 days ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

upvoted a paper 18 days ago

Discrete Markov Bridge

View all activity

Organizations

xuekai's activity

upvoted a paper 3 days ago

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Paper • 2506.08672 • Published 4 days ago • 28

upvoted a paper 16 days ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published 17 days ago • 120

upvoted a paper 18 days ago

Discrete Markov Bridge

Paper • 2505.19752 • Published 19 days ago • 17

upvoted a paper 22 days ago

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Paper • 2505.16278 • Published 23 days ago • 5

authored a paper 25 days ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Paper • 2505.13308 • Published 26 days ago • 26

upvoted a paper 25 days ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Paper • 2505.13308 • Published 26 days ago • 26

authored a paper about 2 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 115

upvoted a paper about 2 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 115

upvoted a paper 3 months ago

Video-T1: Test-Time Scaling for Video Generation

Paper • 2503.18942 • Published Mar 24 • 88

authored a paper 3 months ago

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14 • 27

upvoted a paper 3 months ago

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14 • 27

upvoted an article 4 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 152

upvoted a paper 4 months ago

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published Jan 30 • 22

upvoted 2 articles 5 months ago

Article

Putting RL back in RLHF

and 1 other •

Jun 12, 2024

• 93

Article

Process Reinforcement through Implicit Rewards

and 1 other •

Jan 3

• 27

upvoted a paper 5 months ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 35

upvoted an article 5 months ago

Article

Understanding InstaFlow/Rectified Flow

•

Oct 6, 2023

• 28

upvoted a paper 6 months ago

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 42

commented a paper 6 months ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53 •

authored a paper 6 months ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53