Awesome Process Reward Models

🔗 Table of Contents

Awesome Process Reward Models

📐 PRMs for Mathematical Tasks

RetrievalPRM: "Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning" [arXiv 2025.02]
Multilingual PRM: "Demystifying Multilingual Chain-of-Thought in Process Reward Modeling" [arXiv 2025.02] [Code]
PURE PRM: "Stop Gamma Decay: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning" [Blog] [Code] [Model] [Data]
CFPRM: "Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning" [arXiv 2025.01]
Qwen2.5-Math PRM: "The Lessons of Developing Process Reward Models in Mathematical Reasoning" [arXiv 2025.01] [Website] [Model]
PPM: "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking" [arXiv 2025.01] [Code]
ER-PRM: "Entropy-Regularized Process Reward Model" [arXiv 2024.12] [Code] [Website] [Model] [Data]
Implicit PRM: "Free Process Rewards without Process Labels" [arXiv 2024.12] [Code] [Website] [Model] [Data]
Skywork PRM: "Skywork-o1 Open Series" [Model]
RLHFlow PRM: "An Implementation of Generative PRM" [Code] [Model & Data]
PQM: "Process Reward Model with Q-Value Rankings" [ICLR 2025] [arXiv 2024.10] [Code] [Model]
Math-psa: "OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models" [arXiv 2024.10] [Code] [Website] [Model] [Data]
PAV: "Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning" [ICLR 2025] [arXiv 2024.10]
OmegaPRM: "Improve Mathematical Reasoning in Language Models by Automated Process Supervision" [arXiv 2024.06] [Code (Third Party)]
Math-Shepherd: "Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations" [ACL 2024] [arXiv 2023.12] [Model] [Data]
"Let's reward step by step: Step-Level reward model as the Navigators for Reasoning" [arXiv 2023.10]
"Let's Verify Step by Step" [ICLR 2024] [arXiv 2023.05] [Data] [Blog]
"Solving math word problems with process- and outcome-based feedback" [arXiv 2022.11]

💻 PRMs for Other Tasks

ASPRM: "AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence" [arXiv 2025.02]
AgentPRM: "Process Reward Models for LLM Agents: Practical Framework and Directions" [arXiv 2025.02] [Code]
VersaPRM: "VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data" [arXiv 2025.02] [Code] [Model] [Data]
MedS$^3$: "MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking" [arXiv 2025.01] [Code] [Model] [Data]
OpenPRM: "OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees" [ICLR 2025]
o1-Coder: "o1-Coder: an o1 Replication for Coding" [arXiv 2024.12] [Code]
"Process Supervision-Guided Policy Optimization for Code Generation" [arXiv 2024.10]

📊 Benchmarks

PRMBench: "PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models" [arXiv 2025.01] [Code] [Website] [Data]
ProcessBench: "ProcessBench: Identifying Process Errors in Mathematical Reasoning" [arXiv 2024.12] [Code] [Model] [Data]

💪 Contributing

If you find a paper that should be included but is missing, feel free to create an issue or submit a pull request. Please use the following format to contribute:

- **Method Name**: "Title" [[Journal/Conference](Link)] [[arXiv Year.Month](Link)] [[Code](Link)] [[Website](Link)] [[Model](Link)] [[Data](Link)]

📝 Citation

If you find this work helpful, please consider citing the repository:

@misc{Awesome-Process-Reward-Models,
    title        = {Awesome Process Reward Models},
    author       = {Runze Liu and Jian Zhao and Kaiyan Zhang and Junqi Gao and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
    howpublished = {\url{https://github.com/RyanLiu112/Awesome-Process-Reward-Models}},
    year         = {2025}
}

Our recent work on LLM test-time scaling with PRMs on mathematical tasks:

@article{liu2025can,
    title   = {Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling},
    author  = {Runze Liu and Junqi Gao and Jian Zhao and Kaiyan Zhang and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
    journal = {arXiv preprint arXiv:2502.06703},
    year    = {2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome Process Reward Models

🔗 Table of Contents

📐 PRMs for Mathematical Tasks

💻 PRMs for Other Tasks

📊 Benchmarks

💪 Contributing

📝 Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Process Reward Models

🔗 Table of Contents

📐 PRMs for Mathematical Tasks

💻 PRMs for Other Tasks

📊 Benchmarks

💪 Contributing

📝 Citation