Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP new GRPO dataset and task: formally-verified program correctness #379

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ocramz
Copy link

@ocramz ocramz commented Feb 20, 2025

A new dataset and family of tasks to reward models that judge code snippets as "correct" in the deductive program verification sense [1][2].

We provide two API endpoints:

  • program triple data generation : random program generation in a toy subset of Python, together with preconditions and postconditions. Additionally, the API endpoint returns the execution "trace" (how the variable environment is mutated after each statement).
  • program triple verification : parse model completions and return verification answer.

The API endpoints are parametric, so it's possible to train the model on small ASTs and test on larger ones, or longer programs, etc. to do small-to-large generalization trials.

  • ✅ add GRPO helpers to use the new API endpoints: src/open_r1/rewards/api/code/unfoldml/htgen.py and src/open_r1/rewards/code/htgen.py
  • ✅ add unit/integration tests for the new API bindings
  • ❓ FEEDBACK PLEASE : how to prompt a model under test using structured information? Should we use a specific templating format? Should the prompt be assembled inside the GRPO iterable dataset? for now we assemble the prompt in the dataset generator.

[1] https://en.wikipedia.org/wiki/Correctness_(computer_science)
[2] https://en.wikipedia.org/wiki/Hoare_logic

cc @Muhtasham @vumichien and @lewtun @qgallouedec ^^

@ocramz ocramz mentioned this pull request Feb 20, 2025
@ocramz ocramz changed the title WIP Feature/htgen dataset WIP Feature/htgen dataset and task Feb 20, 2025
@ocramz ocramz changed the title WIP Feature/htgen dataset and task WIP new GRPO dataset and task: formally-verified program correctness Feb 20, 2025
@ocramz ocramz force-pushed the feature/htgen-dataset branch from 69b44ca to 41af428 Compare February 21, 2025 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants