-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document demonic non-determinism #3895
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,80 @@ | ||||||||||||||||||||
# Demonic non-determinism in CBMC and Kani | ||||||||||||||||||||
|
||||||||||||||||||||
CBMC and Kani use demonic non-determinism to more efficiently prove certain | ||||||||||||||||||||
classes of program properties. | ||||||||||||||||||||
This document describes what we mean by "demonic non-determinism," how and why | ||||||||||||||||||||
CBMC and Kani use it, and what challenges arise as a consequence thereof. | ||||||||||||||||||||
|
||||||||||||||||||||
## What is demonic non-determinism? | ||||||||||||||||||||
|
||||||||||||||||||||
When we seek to prove that a property holds for all possible values of some | ||||||||||||||||||||
object, a counterexample demonstrates the existence of at least one such | ||||||||||||||||||||
value that violates the property. | ||||||||||||||||||||
Other properties range across all objects or expressions of a particular kind, | ||||||||||||||||||||
e.g., we want to prove that for all dereference expressions the underlying | ||||||||||||||||||||
pointer is valid. | ||||||||||||||||||||
For a counterexample to this property it suffices to find one pointer that | ||||||||||||||||||||
cannot be safely dereferenced. | ||||||||||||||||||||
We can use this fact to non-deterministically choose to only track a single | ||||||||||||||||||||
pointer across an execution via expressions of the form | ||||||||||||||||||||
`global_pointer = if nondet { new_pointer_to_track } else { global_pointer };`. | ||||||||||||||||||||
We can now use `global_pointer` in describing properties such that the property | ||||||||||||||||||||
fails if `new_pointer_to_track` is invalid and `global_pointer` equals | ||||||||||||||||||||
`new_pointer_to_track`. | ||||||||||||||||||||
Note that this is an "if" statement, not "if, and only if:" there may also be | ||||||||||||||||||||
some other invalid pointer `other_pointer` that is invalid at the time of | ||||||||||||||||||||
dereference, and the back-end solver may choose to resolve the non-determinism | ||||||||||||||||||||
such that the property fails because `global_pointer` equals `other_pointer`. | ||||||||||||||||||||
We call such use of non-determinism _demonic non-determinism_, because the | ||||||||||||||||||||
SAT or SMT-solver will resolve the non-deterministic choice such that a | ||||||||||||||||||||
counterexample is reported _if_ there exists a model such that the overall | ||||||||||||||||||||
property fails. | ||||||||||||||||||||
|
||||||||||||||||||||
## Why use demonic non-determinism? | ||||||||||||||||||||
|
||||||||||||||||||||
For the above example, one way to avoid the non-determinism is to use an array | ||||||||||||||||||||
or map to track all pointers rather than non-deterministically choosing to track | ||||||||||||||||||||
a single one. | ||||||||||||||||||||
The encoding thereof, however, may be much more expensive. See | ||||||||||||||||||||
[cbmc#6506](https://github.com/diffblue/cbmc/pull/6506) for the performance | ||||||||||||||||||||
penalty for such a change from non-determinism to arrays. | ||||||||||||||||||||
|
||||||||||||||||||||
## What are disadvantages of demonic non-determinism? | ||||||||||||||||||||
|
||||||||||||||||||||
The consequence of this choice of encoding is that any given model provided by | ||||||||||||||||||||
the back-end SAT/SMT solver can only encode a violation of one property of the | ||||||||||||||||||||
kind that the demonic non-determinism is used for, even if multiple properties | ||||||||||||||||||||
can be violated along a single execution path. | ||||||||||||||||||||
Comment on lines
+45
to
+47
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand quite what you mean here--could you give an example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at the example above, to find the safety violation for both p1 and p2 we need to invoke the SAT solver twice, because any choice of b1 and b2 only violates at most one assertion along the same trace If on the other hand we had this program: void foo() {
// allocate objects
int *p1 = malloc(...);
int *p2 = malloc(...);
// free/use
free(p1);
*p1 = 1;
// free/use
free(p2);
*p2 = 2;
} void foo() {
// allocate objects
int *p1 = malloc(...);
int *p2 = malloc(...);
// free/use
bool b1 = nondet_bool();
dead_ptr = b1 ? p1 : dead_ptr;
free(p1);
assert(!__CPROVER_same_object(p1, dead_ptr));
*p1 = 1;
// free/use
bool b2 = nondet_bool();
dead_ptr = b2 ? p1 : dead_ptr;
free(p2);
assert(!__CPROVER_same_object(p2, dead_ptr));
*p2 = 1;
} By picking There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah okay, so this is more of a statement about not being guaranteed to catch every violation in a single trace--it depends on the structure of your program. If the program looks like the one in your initial example (free, free, use, use) we'll only find one violation. But if the structure is (free, use, free, use) like in this example, we can find both. So we'll find at least one violation of the property if it exists. |
||||||||||||||||||||
When we now look at assumptions the problem is that these cannot accomplish what | ||||||||||||||||||||
we’d perhaps expect to get out of them: the `assume` statement with a condition of | ||||||||||||||||||||
"single global variable does not have the same value as the pointer to be | ||||||||||||||||||||
dereferenced" does not guarantee that the subsequent dereference is safe with | ||||||||||||||||||||
regard to objects having gone out of scope: the solver can choose to pick | ||||||||||||||||||||
non-deterministic values such that the address of the object having gone out of | ||||||||||||||||||||
scope has never been assigned. | ||||||||||||||||||||
Comment on lines
+53
to
+54
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't get this part either--I think a short example would really help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extending the example above, imagine we encapsulate the condition extern void *dead_ptr;
bool __pointer_not_dead(void *ptr) {
return !__CPROVER_same_object(ptr, dead_ptr);
} Due to the demonic behavior of If you try to use it in an assume: #include <assert.h>
#include <stdbool.h>
#include <stdlib.h>
static void *dead_ptr;
bool __pointer_not_dead(void *ptr) {
return __CPROVER_same_object(ptr, dead_ptr);
}
bool nondet_bool();
int main() {
int *ptr = malloc(sizeof(int));
bool alive = ptr != NULL;
if (alive && nondet_bool()) {
dead_ptr = nondet_bool() ? ptr : dead_ptr;
free(ptr);
alive = false;
}
// here `ptr` is either alive or dead depending on malloc failure and nondet choice
// we try to assume we are on a path were the ptr is not deallocated
__CPROVER_assume(__pointer_not_dead(ptr));
assert(__pointer_not_dead(ptr));
assert(alive);
*ptr = 0;
} This assume does not really force us to be on the path There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, I think I get it now--this was helpful. To make sure that I actually get it, connecting it back to Kani: kani/kani-compiler/src/kani_middle/transform/mod.rs Lines 97 to 105 in 366ce20
Imagine we have some array of length two bytes, and we want to execute this |
||||||||||||||||||||
|
||||||||||||||||||||
As such, demonic non-determinism poses a problem in context of modular | ||||||||||||||||||||
verification, where we need to assume that the given preconditions hold. | ||||||||||||||||||||
Such assumed preconditions would not be preceded by an assertion, and we would | ||||||||||||||||||||
effectively assume that none of the pointers involved in such preconditions were | ||||||||||||||||||||
being tracked by the solver. | ||||||||||||||||||||
|
||||||||||||||||||||
## Where do CBMC and Kani use demonic non-determinism? | ||||||||||||||||||||
|
||||||||||||||||||||
One way we use this today is tracking objects that go out of scope. | ||||||||||||||||||||
We use a single, global variable (initialized to the null pointer) to store the | ||||||||||||||||||||
address of objects going out of scope. | ||||||||||||||||||||
Upon a pointer dereference we then check whether the pointer being dereferenced | ||||||||||||||||||||
has the same (address) value as the value held in that single global variable. | ||||||||||||||||||||
If that is true then we know a dead object (one that has gone out of scope) is | ||||||||||||||||||||
being accessed, amounting to undefined behavior, and we fail that property check. | ||||||||||||||||||||
|
||||||||||||||||||||
This setup can obviously only track a single object (the one that has gone out | ||||||||||||||||||||
of scope most recently). | ||||||||||||||||||||
By adding non-determinism when assigning the address to that single global | ||||||||||||||||||||
variable (`global-ptr = nondet ? global-ptr : &object-leaving-scope;`) we leave | ||||||||||||||||||||
the choice of which object to track to the solver. | ||||||||||||||||||||
This is sound even in presence of multiple objects (and, possibly, multiple | ||||||||||||||||||||
invalid pointers), because the solver will | ||||||||||||||||||||
return at least one counterexample if there is any undefined behavior, and can | ||||||||||||||||||||
indeed be forced to return all such counterexamples. | ||||||||||||||||||||
Comment on lines
+77
to
+80
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this sentence about soundness would better go in the "What is demonic nondeterminism" section (or you could address soundness twice), but I wouldn't leave it until the last sentence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand, imagine this case:
I believe this part:
suggests that this unsoundness is impossible; the solver will simulate all possible outcomes of the nondeterministic decision
global_pointer = if nondet { new_pointer_to_track } else { global_pointer };
, which includes the case whereglobal_pointer
tracksother_ptr
, and thereby find the UB.Is it an accurate mental model to think of this as a sort of breadth-first search, where we explore all possible assignments to
global_pointer
at every point of the program and succeed iff the property holds in all cases? (I don't expect the solver to actually perform such a search, since that would be expensive, but is that in effect what happens)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for those with less background in non-determinism, I'd like a final sentence like:
I think my primary confusion was initially conflating nondeterminism with randomness, so thinking that only tracking one object at a time would be unsound. This is somewhat redundant with what you have--I just want to hammer this point home, since I suspect this will be the most confusing/unintuitive point for people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
global_pointer
is indeed a symbolic pointer that ranges over all possible pointers in the program, controlled by thenondet
choices used to update the variable.These choices are hidden inputs for the program, so if you have
n
choices, the final formula encodes 2^n executions, one for each valuation of these variables. The breadth-first search analogy is not the best one though. We don't control in which order the SAT solver explores these decisions.The instrumented program looks like this:
So our program is controlled by b1, b2 we have 4 possible ways of picking values for these variables:
The SAT solver analyzes all of them, but there is no notion of depth/breadth first.
The idea is that any property of the form
forall x \in {x1, ,,,, xn}, P(x)
can be checked by introducing a nondet variablex
that ranges over{x1, ,,,, xn}
, instead of explicitly expanding the forall intoP(x1) and ... and P(xn)
. This makes the encoding more compact and defers the reasoning to SAT/SMT solving time.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a ton of sense, thank you @remi-delmas-3000!
(It'd be great if this text made it into the document--I think a longer example like this is really useful).