-
RetrievalPRM: "Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning" [arXiv 2025.02]
-
Multilingual PRM: "Demystifying Multilingual Chain-of-Thought in Process Reward Modeling" [arXiv 2025.02] [Code]
-
PURE PRM: "Stop Gamma Decay: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning" [Blog] [Code] [Model] [Data]
-
CFPRM: "Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning" [arXiv 2025.01]
-
Qwen2.5-Math PRM: "The Lessons of Developing Process Reward Models in Mathematical Reasoning" [arXiv 2025.01] [Website] [Model]
-
PPM: "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking" [arXiv 2025.01] [Code]
-
ER-PRM: "Entropy-Regularized Process Reward Model" [arXiv 2024.12] [Code] [Website] [Model] [Data]
-
Implicit PRM: "Free Process Rewards without Process Labels" [arXiv 2024.12] [Code] [Website] [Model] [Data]
-
Skywork PRM: "Skywork-o1 Open Series" [Model]
-
RLHFlow PRM: "An Implementation of Generative PRM" [Code] [Model & Data]
-
PQM: "Process Reward Model with Q-Value Rankings" [ICLR 2025] [arXiv 2024.10] [Code] [Model]
-
Math-psa: "OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models" [arXiv 2024.10] [Code] [Website] [Model] [Data]
-
PAV: "Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning" [ICLR 2025] [arXiv 2024.10]
-
OmegaPRM: "Improve Mathematical Reasoning in Language Models by Automated Process Supervision" [arXiv 2024.06] [Code (Third Party)]
-
Math-Shepherd: "Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations" [ACL 2024] [arXiv 2023.12] [Model] [Data]
-
"Let's reward step by step: Step-Level reward model as the Navigators for Reasoning" [arXiv 2023.10]
-
"Let's Verify Step by Step" [ICLR 2024] [arXiv 2023.05] [Data] [Blog]
-
"Solving math word problems with process- and outcome-based feedback" [arXiv 2022.11]
-
ASPRM: "AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence" [arXiv 2025.02]
-
AgentPRM: "Process Reward Models for LLM Agents: Practical Framework and Directions" [arXiv 2025.02] [Code]
-
VersaPRM: "VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data" [arXiv 2025.02] [Code] [Model] [Data]
-
MedS$^3$: "MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking" [arXiv 2025.01] [Code] [Model] [Data]
-
OpenPRM: "OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees" [ICLR 2025]
-
o1-Coder: "o1-Coder: an o1 Replication for Coding" [arXiv 2024.12] [Code]
-
"Process Supervision-Guided Policy Optimization for Code Generation" [arXiv 2024.10]
-
PRMBench: "PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models" [arXiv 2025.01] [Code] [Website] [Data]
-
ProcessBench: "ProcessBench: Identifying Process Errors in Mathematical Reasoning" [arXiv 2024.12] [Code] [Model] [Data]
If you find a paper that should be included but is missing, feel free to create an issue or submit a pull request. Please use the following format to contribute:
- **Method Name**: "Title" [[Journal/Conference](Link)] [[arXiv Year.Month](Link)] [[Code](Link)] [[Website](Link)] [[Model](Link)] [[Data](Link)]
If you find this work helpful, please consider citing the repository:
@misc{Awesome-Process-Reward-Models,
title = {Awesome Process Reward Models},
author = {Runze Liu and Jian Zhao and Kaiyan Zhang and Junqi Gao and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
howpublished = {\url{https://github.com/RyanLiu112/Awesome-Process-Reward-Models}},
year = {2025}
}
Our recent work on LLM test-time scaling with PRMs on mathematical tasks:
@article{liu2025can,
title = {Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling},
author = {Runze Liu and Junqi Gao and Jian Zhao and Kaiyan Zhang and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
journal = {arXiv preprint arXiv:2502.06703},
year = {2025}
}