Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Testerpce 's Collections
Physics and operators
Materials and structures
Vision Language Action models
Vision
World model
Code
Compression
Data
Process Reward Modelling
Memory
SAE
Applications and Uses
Theory and Representation learning
Adversarial
Graph
Multimodal
Search
Interpretable
Diversity
Diffusion
Self correction
Information_retrieval
Speech
Attention
Synthetic data
Agent
MoE
RAG
Markov chain
Prompt papers
Planning
Sparsity
Multilingual
State space LLM
Partial layer training LLMs
Reasoning
Evaluation
Fine tuning
Math
Dataset and Data processing
Style transfer
Video understanding
Reinforcement learning
Long context
Knowledge

Process Reward Modelling

updated 5 days ago
Upvote
-

  • ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

    Paper • 2506.18896 • Published 17 days ago • 28

  • Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

    Paper • 2505.15277 • Published May 21 • 102

  • PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

    Paper • 2501.03124 • Published Jan 6 • 14

  • Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs

    Paper • 2505.11227 • Published May 16
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs