Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document demonic non-determinism #3895

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tautschnig
Copy link
Member

This PR adds developer-facing documentation to clarify what we mean by demonic non-determinism, and why we use it.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

This PR adds developer-facing documentation to clarify what we mean by
demonic non-determinism, and why we use it.
@tautschnig tautschnig requested a review from a team as a code owner February 18, 2025 13:56
Copy link
Contributor

@carolynzech carolynzech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this up!

Comment on lines +24 to +31
Note that this is an "if" statement, not "if, and only if:" there may also be
some other invalid pointer `other_pointer` that is invalid at the time of
dereference, and the back-end solver may choose to resolve the non-determinism
such that the property fails because `global_pointer` equals `other_pointer`.
We call such use of non-determinism _demonic non-determinism_, because the
SAT or SMT-solver will resolve the non-deterministic choice such that a
counterexample is reported _if_ there exists a model such that the overall
property fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, imagine this case:

int* new_ptr_to_track = malloc(...);
int* other_ptr = malloc(...);
...
free(new_ptr_to_track); // the solver switches to new_ptr_to_track
free(other_ptr); // the solver stays with new_ptr_to_track
*other_ptr // this UB is not caught because we're tracking new_ptr_to_track still

I believe this part:

SAT or SMT-solver will resolve the non-deterministic choice such that a
counterexample is reported if there exists a model such that the overall
property fails.

suggests that this unsoundness is impossible; the solver will simulate all possible outcomes of the nondeterministic decision
global_pointer = if nondet { new_pointer_to_track } else { global_pointer };, which includes the case where global_pointer tracks other_ptr, and thereby find the UB.
Is it an accurate mental model to think of this as a sort of breadth-first search, where we explore all possible assignments to global_pointer at every point of the program and succeed iff the property holds in all cases? (I don't expect the solver to actually perform such a search, since that would be expensive, but is that in effect what happens)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for those with less background in non-determinism, I'd like a final sentence like:

Thus, choosing non-deterministically is not the same as choosing randomly--by choosing non-determistically, soundness is ensured, since the solver will always resolve the non-deterministic choice such that it finds violations of the property if they exist.

I think my primary confusion was initially conflating nondeterminism with randomness, so thinking that only tracking one object at a time would be unsound. This is somewhat redundant with what you have--I just want to hammer this point home, since I suspect this will be the most confusing/unintuitive point for people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it an accurate mental model to think of this as a sort of breadth-first search, where we explore all possible assignments to global_pointer at every point of the program and succeed iff the property holds in all cases?

The global_pointer is indeed a symbolic pointer that ranges over all possible pointers in the program, controlled by the nondet choices used to update the variable.

These choices are hidden inputs for the program, so if you have n choices, the final formula encodes 2^n executions, one for each valuation of these variables. The breadth-first search analogy is not the best one though. We don't control in which order the SAT solver explores these decisions.

void foo() {
  // allocate objects
  int *p1 = malloc(...);
  int *p2 = malloc(...);

  // free objects
  free(p1);
  free(p2);

  // use dead pointers
  *p1 = 1;
  *p2 = 2;
}

The instrumented program looks like this:

// demonic monitor
static void* dead_ptr = (void *)0;

void foo() {

  // allocate objects
  int *p1 = malloc(...);
  int *p2 = malloc(...);

  // free objects
  // demonic choice variable
  bool b1 = nondet_bool();
  dead_ptr = b1  ? p1 : dead_ptr;
  free(p1);


  // demonic choice variable
  bool b2 = nondet_bool();
  dead_ptr = b2  ? p2 : dead_ptr;
  free(p2);

  // use dead pointers
  assert(!__CPROVER_same_object(p1, dead_ptr));
  *p1 = 1;
  assert(!__CPROVER_same_object(p2, dead_ptr));
  *p2 = 2;
}

So our program is controlled by b1, b2 we have 4 possible ways of picking values for these variables:

  1. 0, 0 = track null
  2. 0, 1 = track null then switch to p2
  3. 1, 0 = track null then switch to p1
  4. 1, 1 = track null then switch to p1 then switch to p2

The SAT solver analyzes all of them, but there is no notion of depth/breadth first.

The idea is that any property of the form forall x \in {x1, ,,,, xn}, P(x) can be checked by introducing a nondet variable x that ranges over {x1, ,,,, xn}, instead of explicitly expanding the forall into P(x1) and ... and P(xn). This makes the encoding more compact and defers the reasoning to SAT/SMT solving time.

Copy link
Contributor

@carolynzech carolynzech Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a ton of sense, thank you @remi-delmas-3000!
(It'd be great if this text made it into the document--I think a longer example like this is really useful).

Comment on lines +45 to +47
the back-end SAT/SMT solver can only encode a violation of one property of the
kind that the demonic non-determinism is used for, even if multiple properties
can be violated along a single execution path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand quite what you mean here--could you give an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the example above, to find the safety violation for both p1 and p2 we need to invoke the SAT solver twice, because any choice of b1 and b2 only violates at most one assertion along the same trace

If on the other hand we had this program:

void foo() { 
   // allocate objects 
   int *p1 = malloc(...); 
   int *p2 = malloc(...); 
   
   // free/use
  free(p1);
   *p1 = 1;
   
   // free/use
  free(p2);
   *p2 = 2;
 }
void foo() { 
   // allocate objects 
   int *p1 = malloc(...); 
   int *p2 = malloc(...); 
   
   // free/use
   bool b1 = nondet_bool(); 
   dead_ptr = b1 ? p1 : dead_ptr;
  free(p1);
   assert(!__CPROVER_same_object(p1, dead_ptr));
   *p1 = 1;
   
   // free/use
   bool b2 = nondet_bool(); 
   dead_ptr = b2 ? p1 : dead_ptr;
  free(p2);
   assert(!__CPROVER_same_object(p2, dead_ptr));
   *p2 = 1;
 }

By picking b1 = 1 and b2 = 1 we can violate both assertions along a single trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, so this is more of a statement about not being guaranteed to catch every violation in a single trace--it depends on the structure of your program. If the program looks like the one in your initial example (free, free, use, use) we'll only find one violation. But if the structure is (free, use, free, use) like in this example, we can find both. So we'll find at least one violation of the property if it exists.

Comment on lines +53 to +54
non-deterministic values such that the address of the object having gone out of
scope has never been assigned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part either--I think a short example would really help.

Copy link
Contributor

@remi-delmas-3000 remi-delmas-3000 Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending the example above, imagine we encapsulate the condition !__CPROVER_same_object(p, dead_ptr) into a predicate to hide implementation detail and offer an intrisinc to check for dead pointers:

extern void *dead_ptr;
bool __pointer_not_dead(void *ptr) {
  return !__CPROVER_same_object(ptr, dead_ptr);
}

Due to the demonic behavior of dead_ptr, this predicate will only work as expected under an assertion context (where it implicitly becomes quantified over all nondet choices that define dead_ptr).

If you try to use it in an assume:

#include <assert.h>
#include <stdbool.h>
#include <stdlib.h>

static void *dead_ptr;

bool __pointer_not_dead(void *ptr) {
  return __CPROVER_same_object(ptr, dead_ptr);
}

bool nondet_bool();
int main() {
  int *ptr = malloc(sizeof(int));
  bool alive = ptr != NULL;
  if (alive && nondet_bool()) {
    dead_ptr = nondet_bool() ? ptr : dead_ptr;
    free(ptr);
    alive = false;
  }
   // here `ptr` is either alive or dead depending on malloc failure and nondet choice
   
  // we try to assume we are on a path were the ptr is not deallocated
  __CPROVER_assume(__pointer_not_dead(ptr));
  assert(__pointer_not_dead(ptr));
  assert(alive);
  *ptr = 0;
}

This assume does not really force us to be on the path alive == true all it does is force the demonic monitor to not track the deallocation of ptr, and introduces unsoundness (because once we force the monitor to not look at an object, we miss all violations related to that object).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I get it now--this was helpful. To make sure that I actually get it, connecting it back to Kani:

transformer.add_pass(
queries,
UninitPass {
// Since this uses demonic non-determinism under the hood, should not assume the assertion.
safety_check_type: CheckType::new_safety_check_assert_no_assume(queries),
unsupported_check_type: unsupported_check_type.clone(),
mem_init_fn_cache: queries.kani_functions().clone(),
},
);

Imagine we have some array of length two bytes, and we want to execute this UninitPass to make sure that the memory we're reading is initialized. With demonic nondeterminism, we're going to track one byte at a time. We want to assert that the memory we're reading is initialized, which works in an assertion context since it gets implicitly quantified over all possible ways to track the bytes. But if then try to assume that the memory is initialized after the assert, we're really just telling the solver not to track a certain byte anymore, which would be unsound.

Comment on lines +77 to +80
This is sound even in presence of multiple objects (and, possibly, multiple
invalid pointers), because the solver will
return at least one counterexample if there is any undefined behavior, and can
indeed be forced to return all such counterexamples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence about soundness would better go in the "What is demonic nondeterminism" section (or you could address soundness twice), but I wouldn't leave it until the last sentence.

Co-authored-by: Carolyn Zech <cmzech@amazon.com>
@tautschnig tautschnig self-assigned this Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants