WIP new GRPO dataset and task: formally-verified program correctness #379

ocramz · 2025-02-20T04:26:27Z

A new dataset and family of tasks to reward models that judge code snippets as "correct" in the deductive program verification sense [1][2].

We provide two API endpoints:

program triple data generation : random program generation in a toy subset of Python, together with preconditions and postconditions. Additionally, the API endpoint returns the execution "trace" (how the variable environment is mutated after each statement).
program triple verification : parse model completions and return verification answer.

The API endpoints are parametric, so it's possible to train the model on small ASTs and test on larger ones, or longer programs, etc. to do small-to-large generalization trials.

✅ add GRPO helpers to use the new API endpoints: src/open_r1/rewards/api/code/unfoldml/htgen.py and src/open_r1/rewards/code/htgen.py
✅ add unit/integration tests for the new API bindings
❓ FEEDBACK PLEASE : ~~how to prompt a model under test using structured information? Should we use a specific templating format? Should the prompt be assembled inside the GRPO iterable dataset?~~ for now we assemble the prompt in the dataset generator.

[1] https://en.wikipedia.org/wiki/Correctness_(computer_science)
[2] https://en.wikipedia.org/wiki/Hoare_logic

cc @Muhtasham @vumichien and @lewtun @qgallouedec ^^

Marco Zocca added 2 commits February 20, 2025 04:57

wip adding HTGen dataset and benchmark

591f3c1

add API test

69b44ca

ocramz mentioned this pull request Feb 20, 2025

Datasets for code #28

Open

ocramz changed the title ~~WIP Feature/htgen dataset~~ WIP Feature/htgen dataset and task Feb 20, 2025

ocramz changed the title ~~WIP Feature/htgen dataset and task~~ WIP new GRPO dataset and task: formally-verified program correctness Feb 20, 2025

Marco Zocca added 2 commits February 21, 2025 08:12

wip adding HTGen dataset and benchmark

b3a9587

add API test

41af428

ocramz force-pushed the feature/htgen-dataset branch from 69b44ca to 41af428 Compare February 21, 2025 07:12

Marco Zocca and others added 6 commits February 22, 2025 07:56

construct prompt in the dataset generator

58cbde4

Merge branch 'main' into feature/htgen-dataset

aa3523d

merge from upstream

cde339c

prompt construction

2d225e9

fix some typos and add more docstrings

d28e8f7

add reward

40fdef8

Muhtasham approved these changes Feb 22, 2025

View reviewed changes

Merge branch 'main' into feature/htgen-dataset

4341395

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP new GRPO dataset and task: formally-verified program correctness #379

WIP new GRPO dataset and task: formally-verified program correctness #379

ocramz commented Feb 20, 2025 •

edited

Loading

WIP new GRPO dataset and task: formally-verified program correctness #379

Are you sure you want to change the base?

WIP new GRPO dataset and task: formally-verified program correctness #379

Conversation

ocramz commented Feb 20, 2025 • edited Loading

ocramz commented Feb 20, 2025 •

edited

Loading