-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document demonic non-determinism #3895
base: main
Are you sure you want to change the base?
Conversation
This PR adds developer-facing documentation to clarify what we mean by demonic non-determinism, and why we use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing this up!
Note that this is an "if" statement, not "if, and only if:" there may also be | ||
some other invalid pointer `other_pointer` that is invalid at the time of | ||
dereference, and the back-end solver may choose to resolve the non-determinism | ||
such that the property fails because `global_pointer` equals `other_pointer`. | ||
We call such use of non-determinism _demonic non-determinism_, because the | ||
SAT or SMT-solver will resolve the non-deterministic choice such that a | ||
counterexample is reported _if_ there exists a model such that the overall | ||
property fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand, imagine this case:
int* new_ptr_to_track = malloc(...);
int* other_ptr = malloc(...);
...
free(new_ptr_to_track); // the solver switches to new_ptr_to_track
free(other_ptr); // the solver stays with new_ptr_to_track
*other_ptr // this UB is not caught because we're tracking new_ptr_to_track still
I believe this part:
SAT or SMT-solver will resolve the non-deterministic choice such that a
counterexample is reported if there exists a model such that the overall
property fails.
suggests that this unsoundness is impossible; the solver will simulate all possible outcomes of the nondeterministic decision
global_pointer = if nondet { new_pointer_to_track } else { global_pointer };
, which includes the case where global_pointer
tracks other_ptr
, and thereby find the UB.
Is it an accurate mental model to think of this as a sort of breadth-first search, where we explore all possible assignments to global_pointer
at every point of the program and succeed iff the property holds in all cases? (I don't expect the solver to actually perform such a search, since that would be expensive, but is that in effect what happens)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for those with less background in non-determinism, I'd like a final sentence like:
Thus, choosing non-deterministically is not the same as choosing randomly--by choosing non-determistically, soundness is ensured, since the solver will always resolve the non-deterministic choice such that it finds violations of the property if they exist.
I think my primary confusion was initially conflating nondeterminism with randomness, so thinking that only tracking one object at a time would be unsound. This is somewhat redundant with what you have--I just want to hammer this point home, since I suspect this will be the most confusing/unintuitive point for people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it an accurate mental model to think of this as a sort of breadth-first search, where we explore all possible assignments to
global_pointer
at every point of the program and succeed iff the property holds in all cases?
The global_pointer
is indeed a symbolic pointer that ranges over all possible pointers in the program, controlled by the nondet
choices used to update the variable.
These choices are hidden inputs for the program, so if you have n
choices, the final formula encodes 2^n executions, one for each valuation of these variables. The breadth-first search analogy is not the best one though. We don't control in which order the SAT solver explores these decisions.
void foo() {
// allocate objects
int *p1 = malloc(...);
int *p2 = malloc(...);
// free objects
free(p1);
free(p2);
// use dead pointers
*p1 = 1;
*p2 = 2;
}
The instrumented program looks like this:
// demonic monitor
static void* dead_ptr = (void *)0;
void foo() {
// allocate objects
int *p1 = malloc(...);
int *p2 = malloc(...);
// free objects
// demonic choice variable
bool b1 = nondet_bool();
dead_ptr = b1 ? p1 : dead_ptr;
free(p1);
// demonic choice variable
bool b2 = nondet_bool();
dead_ptr = b2 ? p2 : dead_ptr;
free(p2);
// use dead pointers
assert(!__CPROVER_same_object(p1, dead_ptr));
*p1 = 1;
assert(!__CPROVER_same_object(p2, dead_ptr));
*p2 = 2;
}
So our program is controlled by b1, b2 we have 4 possible ways of picking values for these variables:
- 0, 0 = track null
- 0, 1 = track null then switch to p2
- 1, 0 = track null then switch to p1
- 1, 1 = track null then switch to p1 then switch to p2
The SAT solver analyzes all of them, but there is no notion of depth/breadth first.
The idea is that any property of the form forall x \in {x1, ,,,, xn}, P(x)
can be checked by introducing a nondet variable x
that ranges over {x1, ,,,, xn}
, instead of explicitly expanding the forall into P(x1) and ... and P(xn)
. This makes the encoding more compact and defers the reasoning to SAT/SMT solving time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a ton of sense, thank you @remi-delmas-3000!
(It'd be great if this text made it into the document--I think a longer example like this is really useful).
the back-end SAT/SMT solver can only encode a violation of one property of the | ||
kind that the demonic non-determinism is used for, even if multiple properties | ||
can be violated along a single execution path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand quite what you mean here--could you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the example above, to find the safety violation for both p1 and p2 we need to invoke the SAT solver twice, because any choice of b1 and b2 only violates at most one assertion along the same trace
If on the other hand we had this program:
void foo() {
// allocate objects
int *p1 = malloc(...);
int *p2 = malloc(...);
// free/use
free(p1);
*p1 = 1;
// free/use
free(p2);
*p2 = 2;
}
void foo() {
// allocate objects
int *p1 = malloc(...);
int *p2 = malloc(...);
// free/use
bool b1 = nondet_bool();
dead_ptr = b1 ? p1 : dead_ptr;
free(p1);
assert(!__CPROVER_same_object(p1, dead_ptr));
*p1 = 1;
// free/use
bool b2 = nondet_bool();
dead_ptr = b2 ? p1 : dead_ptr;
free(p2);
assert(!__CPROVER_same_object(p2, dead_ptr));
*p2 = 1;
}
By picking b1 = 1
and b2 = 1
we can violate both assertions along a single trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, so this is more of a statement about not being guaranteed to catch every violation in a single trace--it depends on the structure of your program. If the program looks like the one in your initial example (free, free, use, use) we'll only find one violation. But if the structure is (free, use, free, use) like in this example, we can find both. So we'll find at least one violation of the property if it exists.
non-deterministic values such that the address of the object having gone out of | ||
scope has never been assigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this part either--I think a short example would really help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extending the example above, imagine we encapsulate the condition !__CPROVER_same_object(p, dead_ptr)
into a predicate to hide implementation detail and offer an intrisinc to check for dead pointers:
extern void *dead_ptr;
bool __pointer_not_dead(void *ptr) {
return !__CPROVER_same_object(ptr, dead_ptr);
}
Due to the demonic behavior of dead_ptr
, this predicate will only work as expected under an assertion context (where it implicitly becomes quantified over all nondet choices that define dead_ptr
).
If you try to use it in an assume:
#include <assert.h>
#include <stdbool.h>
#include <stdlib.h>
static void *dead_ptr;
bool __pointer_not_dead(void *ptr) {
return __CPROVER_same_object(ptr, dead_ptr);
}
bool nondet_bool();
int main() {
int *ptr = malloc(sizeof(int));
bool alive = ptr != NULL;
if (alive && nondet_bool()) {
dead_ptr = nondet_bool() ? ptr : dead_ptr;
free(ptr);
alive = false;
}
// here `ptr` is either alive or dead depending on malloc failure and nondet choice
// we try to assume we are on a path were the ptr is not deallocated
__CPROVER_assume(__pointer_not_dead(ptr));
assert(__pointer_not_dead(ptr));
assert(alive);
*ptr = 0;
}
This assume does not really force us to be on the path alive == true
all it does is force the demonic monitor to not track the deallocation of ptr
, and introduces unsoundness (because once we force the monitor to not look at an object, we miss all violations related to that object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I get it now--this was helpful. To make sure that I actually get it, connecting it back to Kani:
kani/kani-compiler/src/kani_middle/transform/mod.rs
Lines 97 to 105 in 366ce20
transformer.add_pass( | |
queries, | |
UninitPass { | |
// Since this uses demonic non-determinism under the hood, should not assume the assertion. | |
safety_check_type: CheckType::new_safety_check_assert_no_assume(queries), | |
unsupported_check_type: unsupported_check_type.clone(), | |
mem_init_fn_cache: queries.kani_functions().clone(), | |
}, | |
); |
Imagine we have some array of length two bytes, and we want to execute this UninitPass
to make sure that the memory we're reading is initialized. With demonic nondeterminism, we're going to track one byte at a time. We want to assert that the memory we're reading is initialized, which works in an assertion context since it gets implicitly quantified over all possible ways to track the bytes. But if then try to assume that the memory is initialized after the assert, we're really just telling the solver not to track a certain byte anymore, which would be unsound.
This is sound even in presence of multiple objects (and, possibly, multiple | ||
invalid pointers), because the solver will | ||
return at least one counterexample if there is any undefined behavior, and can | ||
indeed be forced to return all such counterexamples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sentence about soundness would better go in the "What is demonic nondeterminism" section (or you could address soundness twice), but I wouldn't leave it until the last sentence.
Co-authored-by: Carolyn Zech <cmzech@amazon.com>
This PR adds developer-facing documentation to clarify what we mean by demonic non-determinism, and why we use it.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.