Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.20571

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published 5 days ago • 76

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published 5 days ago • 76

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published 5 days ago • 76

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 272
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 19 days ago • 252
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published Mar 31 • 53
Seedream 3.0 Technical Report

Paper • 2504.11346 • Published 19 days ago • 52

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published 10 days ago • 29
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published 11 days ago • 20
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published 12 days ago • 18
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published 13 days ago • 33

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 63
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 118
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 111
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 123

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

Paper • 2503.21380 • Published Mar 27 • 37
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27 • 78
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published Mar 27 • 22
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published 19 days ago • 11

VLM papers for science

Collecting papers that help understand how well VLMs perform in tasks related to science

Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs

Paper • 2503.12303 • Published Mar 16 • 7
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published Mar 17 • 30
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17 • 29
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

Paper • 2503.13399 • Published Mar 17 • 21

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 123
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 15 days ago • 118
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 13 days ago • 79

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 147
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 47
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 46

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs