Skip to content

RyanLiu112/Awesome-Process-Reward-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

Awesome Process Reward Models

Awesome

πŸ”— Table of Contents

πŸ“ PRMs for Mathematical Tasks

πŸ’» PRMs for Other Tasks

  • ASPRM: "AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence" [arXiv 2025.02]

  • AgentPRM: "Process Reward Models for LLM Agents: Practical Framework and Directions" [arXiv 2025.02] [Code]

  • VersaPRM: "VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data" [arXiv 2025.02] [Code] [Model] [Data]

  • MedS$^3$: "MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking" [arXiv 2025.01] [Code] [Model] [Data]

  • OpenPRM: "OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees" [ICLR 2025]

  • o1-Coder: "o1-Coder: an o1 Replication for Coding" [arXiv 2024.12] [Code]

  • "Process Supervision-Guided Policy Optimization for Code Generation" [arXiv 2024.10]

πŸ“Š Benchmarks

πŸ’ͺ Contributing

If you find a paper that should be included but is missing, feel free to create an issue or submit a pull request. Please use the following format to contribute:

- **Method Name**: "Title" [[Journal/Conference](Link)] [[arXiv Year.Month](Link)] [[Code](Link)] [[Website](Link)] [[Model](Link)] [[Data](Link)]

πŸ“ Citation

If you find this work helpful, please consider citing the repository:

@misc{Awesome-Process-Reward-Models,
    title        = {Awesome Process Reward Models},
    author       = {Runze Liu and Jian Zhao and Kaiyan Zhang and Junqi Gao and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
    howpublished = {\url{https://github.com/RyanLiu112/Awesome-Process-Reward-Models}},
    year         = {2025}
}

Our recent work on LLM test-time scaling with PRMs on mathematical tasks:

@article{liu2025can,
    title   = {Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling},
    author  = {Runze Liu and Junqi Gao and Jian Zhao and Kaiyan Zhang and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
    journal = {arXiv preprint arXiv:2502.06703},
    year    = {2025}
}

About

A comprehensive collection of process reward models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published