Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document demonic non-determinism #3895

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,12 @@

- [Developer documentation](dev-documentation.md)
- [Coding conventions](./conventions.md)
- [Command cheat sheets](./cheat-sheets.md)
- [Working with CBMC](./cbmc-hacks.md)
- [Working with `rustc`](./rustc-hacks.md)
- [Migrating to StableMIR](./stable-mir.md)
- [Command cheat sheets](./cheat-sheets.md)
- [cargo kani assess](./dev-assess.md)
- [Demonic non-determinism](./demonic-nondet.md)
- [Testing](./testing.md)
- [Regression testing](./regression-testing.md)
- [(Experimental) Testing with a Large Number of Repositories](./repo-crawl.md)
Expand Down
80 changes: 80 additions & 0 deletions docs/src/demonic-nondet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Demonic non-determinism in CBMC and Kani

CBMC and Kani use demonic non-determinism to more efficiently prove certain
classes of program properties.
This document describes what we mean by "demonic non-determinism," how and why
CBMC and Kani use it, and what challenges arise as a consequence thereof.

## What is demonic non-determinism?

When we seek to prove that a property holds for all possible values of some
object, a counterexample demonstrates the existence of at least one such
value that violates the property.
Other properties range across all objects or expressions of a particular kind,
e.g., we want to prove that for all dereference expressions the underlying
pointer is valid.
For a counterexample to this property it suffices to find one pointer that
cannot be safely dereferenced.
We can use this fact to non-deterministically choose to only track a single
pointer across an execution via expressions of the form
`global_pointer = if nondet { new_pointer_to_track } else { global_pointer };`.
We can now use `global_pointer` in describing properties such that the property
fails if `new_pointer_to_track` is invalid and `global_pointer` equals
`new_pointer_to_track`.
Note that this is an "if" statement, not "if, and only if:" there may also be
some other invalid pointer `other_pointer` that is invalid at the time of
dereference, and the back-end solver may choose to resolve the non-determinism
such that the property fails because `global_pointer` equals `other_pointer`.
We call such use of non-determinism _demonic non-determinism_, because the
SAT or SMT-solver will resolve the non-deterministic choice such that a
counterexample is reported _if_ there exists a model such that the overall
property fails.
Comment on lines +24 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, imagine this case:

int* new_ptr_to_track = malloc(...);
int* other_ptr = malloc(...);
...
free(new_ptr_to_track); // the solver switches to new_ptr_to_track
free(other_ptr); // the solver stays with new_ptr_to_track
*other_ptr // this UB is not caught because we're tracking new_ptr_to_track still

I believe this part:

SAT or SMT-solver will resolve the non-deterministic choice such that a
counterexample is reported if there exists a model such that the overall
property fails.

suggests that this unsoundness is impossible; the solver will simulate all possible outcomes of the nondeterministic decision
global_pointer = if nondet { new_pointer_to_track } else { global_pointer };, which includes the case where global_pointer tracks other_ptr, and thereby find the UB.
Is it an accurate mental model to think of this as a sort of breadth-first search, where we explore all possible assignments to global_pointer at every point of the program and succeed iff the property holds in all cases? (I don't expect the solver to actually perform such a search, since that would be expensive, but is that in effect what happens)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for those with less background in non-determinism, I'd like a final sentence like:

Thus, choosing non-deterministically is not the same as choosing randomly--by choosing non-determistically, soundness is ensured, since the solver will always resolve the non-deterministic choice such that it finds violations of the property if they exist.

I think my primary confusion was initially conflating nondeterminism with randomness, so thinking that only tracking one object at a time would be unsound. This is somewhat redundant with what you have--I just want to hammer this point home, since I suspect this will be the most confusing/unintuitive point for people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it an accurate mental model to think of this as a sort of breadth-first search, where we explore all possible assignments to global_pointer at every point of the program and succeed iff the property holds in all cases?

The global_pointer is indeed a symbolic pointer that ranges over all possible pointers in the program, controlled by the nondet choices used to update the variable.

These choices are hidden inputs for the program, so if you have n choices, the final formula encodes 2^n executions, one for each valuation of these variables. The breadth-first search analogy is not the best one though. We don't control in which order the SAT solver explores these decisions.

void foo() {
  // allocate objects
  int *p1 = malloc(...);
  int *p2 = malloc(...);

  // free objects
  free(p1);
  free(p2);

  // use dead pointers
  *p1 = 1;
  *p2 = 2;
}

The instrumented program looks like this:

// demonic monitor
static void* dead_ptr = (void *)0;

void foo() {

  // allocate objects
  int *p1 = malloc(...);
  int *p2 = malloc(...);

  // free objects
  // demonic choice variable
  bool b1 = nondet_bool();
  dead_ptr = b1  ? p1 : dead_ptr;
  free(p1);


  // demonic choice variable
  bool b2 = nondet_bool();
  dead_ptr = b2  ? p2 : dead_ptr;
  free(p2);

  // use dead pointers
  assert(!__CPROVER_same_object(p1, dead_ptr));
  *p1 = 1;
  assert(!__CPROVER_same_object(p2, dead_ptr));
  *p2 = 2;
}

So our program is controlled by b1, b2 we have 4 possible ways of picking values for these variables:

  1. 0, 0 = track null
  2. 0, 1 = track null then switch to p2
  3. 1, 0 = track null then switch to p1
  4. 1, 1 = track null then switch to p1 then switch to p2

The SAT solver analyzes all of them, but there is no notion of depth/breadth first.

The idea is that any property of the form forall x \in {x1, ,,,, xn}, P(x) can be checked by introducing a nondet variable x that ranges over {x1, ,,,, xn}, instead of explicitly expanding the forall into P(x1) and ... and P(xn). This makes the encoding more compact and defers the reasoning to SAT/SMT solving time.

Copy link
Contributor

@carolynzech carolynzech Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a ton of sense, thank you @remi-delmas-3000!
(It'd be great if this text made it into the document--I think a longer example like this is really useful).


## Why use demonic non-determinism?

For the above example, one way to avoid the non-determinism is to use an array
or map to track all pointers rather than non-deterministically choosing to track
a single one.
The encoding thereof, however, may be much more expensive. See
[cbmc#6506](https://github.com/diffblue/cbmc/pull/6506) for the performance
penalty for such a change from non-determinism to arrays.

## What are disadvantages of demonic non-determinism?

The consequence of this choice of encoding is that any given model provided by
the back-end SAT/SMT solver can only encode a violation of one property of the
kind that the demonic non-determinism is used for, even if multiple properties
can be violated along a single execution path.
Comment on lines +45 to +47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand quite what you mean here--could you give an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the example above, to find the safety violation for both p1 and p2 we need to invoke the SAT solver twice, because any choice of b1 and b2 only violates at most one assertion along the same trace

If on the other hand we had this program:

void foo() { 
   // allocate objects 
   int *p1 = malloc(...); 
   int *p2 = malloc(...); 
   
   // free/use
  free(p1);
   *p1 = 1;
   
   // free/use
  free(p2);
   *p2 = 2;
 }
void foo() { 
   // allocate objects 
   int *p1 = malloc(...); 
   int *p2 = malloc(...); 
   
   // free/use
   bool b1 = nondet_bool(); 
   dead_ptr = b1 ? p1 : dead_ptr;
  free(p1);
   assert(!__CPROVER_same_object(p1, dead_ptr));
   *p1 = 1;
   
   // free/use
   bool b2 = nondet_bool(); 
   dead_ptr = b2 ? p1 : dead_ptr;
  free(p2);
   assert(!__CPROVER_same_object(p2, dead_ptr));
   *p2 = 1;
 }

By picking b1 = 1 and b2 = 1 we can violate both assertions along a single trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, so this is more of a statement about not being guaranteed to catch every violation in a single trace--it depends on the structure of your program. If the program looks like the one in your initial example (free, free, use, use) we'll only find one violation. But if the structure is (free, use, free, use) like in this example, we can find both. So we'll find at least one violation of the property if it exists.

When we now look at assumptions the problem is that these cannot accomplish what
we’d perhaps expect to get out of them: the `assume` statement with a condition of
"single global variable does not have the same value as the pointer to be
dereferenced" does not guarantee that the subsequent dereference is safe with
regard to objects having gone out of scope: the solver can choose to pick
non-deterministic values such that the address of the object having gone out of
scope has never been assigned.
Comment on lines +53 to +54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part either--I think a short example would really help.

Copy link
Contributor

@remi-delmas-3000 remi-delmas-3000 Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending the example above, imagine we encapsulate the condition !__CPROVER_same_object(p, dead_ptr) into a predicate to hide implementation detail and offer an intrisinc to check for dead pointers:

extern void *dead_ptr;
bool __pointer_not_dead(void *ptr) {
  return !__CPROVER_same_object(ptr, dead_ptr);
}

Due to the demonic behavior of dead_ptr, this predicate will only work as expected under an assertion context (where it implicitly becomes quantified over all nondet choices that define dead_ptr).

If you try to use it in an assume:

#include <assert.h>
#include <stdbool.h>
#include <stdlib.h>

static void *dead_ptr;

bool __pointer_not_dead(void *ptr) {
  return __CPROVER_same_object(ptr, dead_ptr);
}

bool nondet_bool();
int main() {
  int *ptr = malloc(sizeof(int));
  bool alive = ptr != NULL;
  if (alive && nondet_bool()) {
    dead_ptr = nondet_bool() ? ptr : dead_ptr;
    free(ptr);
    alive = false;
  }
   // here `ptr` is either alive or dead depending on malloc failure and nondet choice
   
  // we try to assume we are on a path were the ptr is not deallocated
  __CPROVER_assume(__pointer_not_dead(ptr));
  assert(__pointer_not_dead(ptr));
  assert(alive);
  *ptr = 0;
}

This assume does not really force us to be on the path alive == true all it does is force the demonic monitor to not track the deallocation of ptr, and introduces unsoundness (because once we force the monitor to not look at an object, we miss all violations related to that object).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I get it now--this was helpful. To make sure that I actually get it, connecting it back to Kani:

transformer.add_pass(
queries,
UninitPass {
// Since this uses demonic non-determinism under the hood, should not assume the assertion.
safety_check_type: CheckType::new_safety_check_assert_no_assume(queries),
unsupported_check_type: unsupported_check_type.clone(),
mem_init_fn_cache: queries.kani_functions().clone(),
},
);

Imagine we have some array of length two bytes, and we want to execute this UninitPass to make sure that the memory we're reading is initialized. With demonic nondeterminism, we're going to track one byte at a time. We want to assert that the memory we're reading is initialized, which works in an assertion context since it gets implicitly quantified over all possible ways to track the bytes. But if then try to assume that the memory is initialized after the assert, we're really just telling the solver not to track a certain byte anymore, which would be unsound.


As such, demonic non-determinism poses a problem in context of modular
verification, where we need to assume that the given preconditions hold.
Such assumed preconditions would not be preceded by an assertion, and we would
effectively assume that none of the pointers involved in such preconditions were
being tracked by the solver.

## Where do CBMC and Kani use demonic non-determinism?

One way we use this today is tracking objects that go out of scope.
We use a single, global variable (initialized to the null pointer) to store the
address of objects going out of scope.
Upon a pointer dereference we then check whether the pointer being dereferenced
has the same (address) value as the value held in that single global variable.
If that is true then we know a dead object (one that has gone out of scope) is
being accessed, amounting to undefined behavior, and we fail that property check.

This setup can obviously only track a single object (the one that has gone out
of scope most recently).
By adding non-determinism when assigning the address to that single global
variable (`global-ptr = nondet ? global-ptr : &object-leaving-scope;`) we leave
the choice of which object to track to the solver.
This is sound even in presence of multiple objects (and, possibly, multiple
invalid pointers), because the solver will
return at least one counterexample if there is any undefined behavior, and can
indeed be forced to return all such counterexamples.
Comment on lines +77 to +80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence about soundness would better go in the "What is demonic nondeterminism" section (or you could address soundness twice), but I wouldn't leave it until the last sentence.

7 changes: 5 additions & 2 deletions docs/src/dev-documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,11 @@ developers (including external contributors):
2. [Useful command-line instructions for Kani/CBMC/Git](./cheat-sheets.md).
3. [Development setup recommendations for working with `cbmc`](./cbmc-hacks.md).
4. [Development setup recommendations for working with `rustc`](./rustc-hacks.md).
5. [Guide for testing in Kani](./testing.md).
6. [Transition to StableMIR](./stable-mir.md).
5. [Transition to StableMIR](./stable-mir.md).
6. [Using cargo kani assess](./dev-assess.md).
7. [Demonic non-determinism](./demonic-nondet.md)
8. [Guide for testing in Kani](./testing.md).
9. [Performance comparison for Kani](./performance-comparisons.md).

> **NOTE**: The developer documentation is intended for Kani developers and not
users. At present, the project is under heavy development and some items
Expand Down
Loading