Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.15257

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 120
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

about 23 hours ago

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published 24 days ago • 10
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published 16 days ago • 42
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published 12 days ago • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published 12 days ago • 84

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published 26 days ago • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 118
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 111
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 122

ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 25
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

Paper • 2308.08155 • Published Aug 16, 2023 • 7
Preference Learning Unlocks LLMs' Psycho-Counseling Skills

Paper • 2502.19731 • Published Feb 27 • 7
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 6 days ago • 72

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Paper • 2310.03731 • Published Oct 5, 2023 • 29
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search

Paper • 2310.13227 • Published Oct 20, 2023 • 13
ChipNeMo: Domain-Adapted LLMs for Chip Design

Paper • 2311.00176 • Published Oct 31, 2023 • 9
Language Models can be Logical Solvers

Paper • 2311.06158 • Published Nov 10, 2023 • 23

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 39
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 55
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 36
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 26

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs