Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.20157

reasoning training via RLAIF

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published 5 days ago • 32

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published 5 days ago • 32
The Leaderboard Illusion

Paper • 2504.20879 • Published 5 days ago • 56
ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published 5 days ago • 44

Reinforcement Learning

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published 11 days ago • 20
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published 5 days ago • 32

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 63
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 118
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 111
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 123

about 9 hours ago

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 102
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 42
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

Paper • 2412.15443 • Published Dec 19, 2024 • 10

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 61
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report

Paper • 2504.21039 • Published 6 days ago • 15
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 26 days ago • 177
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2 • 83

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs