Peng Wang's picture

Peng Wang

stillarrow

·

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago

Nickyang/FastCuRL-1.5B-V3

upvoted a collection 2 days ago

liked a dataset 8 days ago

OpenAssistant/oasst1

View all activity

Organizations

None yet

upvoted a collection 2 days ago

FastCuRL

The collection for the Paper "Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models" • 6 items • Updated May 29 • 2

upvoted a collection 22 days ago

"Physics of Language Models" series

6 items • Updated Aug 30, 2024 • 45

upvoted a paper 29 days ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 249

upvoted a collection 29 days ago

Tool-Star

8 items • Updated 26 days ago • 4

upvoted a paper about 2 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 55

upvoted 2 papers 3 months ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 54

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15 • 19

upvoted a collection 3 months ago

Gemma 3 Release

24 items • Updated 16 days ago • 419

upvoted an article 4 months ago

Article

The Large Language Model Course

By

•

Jan 16

• 196

upvoted 3 articles 5 months ago

Article

Mastering Tensor Dimensions in Transformers

By

•

Jan 12

• 79

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

By

•

Jan 30

• 100

Article

Open R1: Update #2

By

and 6 others •

Feb 10

• 216

upvoted a collection 6 months ago

OpenMath

A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated 4 days ago • 44

upvoted 2 papers 6 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 100

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3, 2024 • 55

upvoted a paper 7 months ago

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84

upvoted a paper 12 months ago

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 56

upvoted 3 papers about 1 year ago

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Paper • 2407.04078 • Published Jul 4, 2024 • 21

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 82

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29, 2024 • 40