Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16084

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published 27 days ago • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 118
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 111
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 122

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 122
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 8 days ago • 104
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 6 days ago • 73

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published Feb 6 • 24
Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published 5 days ago • 38
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 4 days ago • 83
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Paper • 2504.13367 • Published 9 days ago • 24

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 146
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 47
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 46

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 120
Competitive Programming with Large Reasoning Models

Paper • 2502.06807 • Published Feb 3 • 70
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5 • 61
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 151

Test-Time Compute/Optimal Scaling

Scaling LLM Inference with Optimized Sample Compute Allocation

Paper • 2410.22480 • Published Oct 29, 2024
Test-time Computing: from System-1 Thinking to System-2 Thinking

Paper • 2501.02497 • Published Jan 5 • 46
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Paper • 2412.14135 • Published Dec 18, 2024
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 98

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 47
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 38
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 48

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Paper • 2412.12094 • Published Dec 16, 2024 • 11
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Paper • 2306.07691 • Published Jun 13, 2023 • 8
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform

Paper • 2203.02395 • Published Mar 4, 2022

Learned Compression for Compressed Learning

Paper • 2412.09405 • Published Dec 12, 2024 • 13
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 48
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 4 days ago • 83

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18, 2024 • 34
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18, 2024 • 39
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Paper • 2402.12875 • Published Feb 20, 2024 • 13
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1, 2024 • 33

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs