Skip to content

Commit 283ccda

Browse files
authored
Merge pull request #490 from nikomatsakis/salsa-30-new-update-regime
salsa 3.0
2 parents 05b4e3e + 1544ee9 commit 283ccda

File tree

101 files changed

+3961
-1212
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+3961
-1212
lines changed

book/src/SUMMARY.md

+2
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
- [Defining the checker](./tutorial/checker.md)
1717
- [Defining the interpreter](./tutorial/interpreter.md)
1818
- [Reference](./reference.md)
19+
- [Durability](./reference/durability.md)
1920
- [Algorithm](./reference/algorithm.md)
2021
- [Common patterns](./common_patterns.md)
2122
- [Selection](./common_patterns/selection.md)
@@ -31,6 +32,7 @@
3132
- [Plumbing](./plumbing.md)
3233
- [Jars and ingredients](./plumbing/jars_and_ingredients.md)
3334
- [Databases and runtime](./plumbing/database_and_runtime.md)
35+
- [The db lifetime on tracked/interned structs](./plumbing/db_lifetime.md)
3436
- [Tracked structures](./plumbing/tracked_structs.md)
3537
- [Query operations](./plumbing/query_ops.md)
3638
- [maybe changed after](./plumbing/maybe_changed_after.md)

book/src/overview.md

+8-4
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,12 @@ Finally, you can also modify the value of an input field by using the setter met
114114
Since this is modifying the input, the setter takes an `&mut`-reference to the database:
115115

116116
```rust
117-
file.set_contents(&mut db, String::from("fn foo() { /* add a comment */ }"));
117+
file.set_contents(&mut db).to(String::from("fn foo() { /* add a comment */ }"));
118118
```
119119

120+
Note that the setter method `set_contents` returns a "builder".
121+
This gives the ability to set the [durability](./reference/durability.md) and other advanced concepts.
122+
120123
## Tracked functions
121124

122125
Once you've defined your inputs, the next thing to define are **tracked functions**:
@@ -147,12 +150,13 @@ Tracked functions can return any clone-able type. A clone is required since, whe
147150

148151
**Tracked structs** are intermediate structs created during your computation.
149152
Like inputs, their fields are stored inside the database, and the struct itself just wraps an id.
150-
Unlike inputs, they can only be created inside a tracked function, and their fields can never change once they are created.
151-
Getter methods are provided to read the fields, but there are no setter methods[^specify]. Example:
153+
Unlike inputs, they can only be created inside a tracked function, and their fields can never change once they are created (until the next revision, at least).
154+
Getter methods are provided to read the fields, but there are no setter methods.
155+
Example:
152156

153157
```rust
154158
#[salsa::tracked]
155-
struct Ast {
159+
struct Ast<'db> {
156160
#[return_ref]
157161
top_level_items: Vec<Item>,
158162
}

book/src/plumbing/db_lifetime.md

+238
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# The `'db` lifetime
2+
3+
[Tracked](./tracked_structs.md) and interned structs are both declared with a `'db` lifetime.
4+
This lifetime is linked to the `db: &DB` reference used to create them.
5+
The `'db` lifetime has several implications:
6+
7+
* It ensures that the user does not create a new salsa revision while a tracked/interned struct is in active use. Creating a new salsa revision requires modifying an input which requires an `&mut DB` reference, therefore it cannot occur during `'db`.
8+
* The struct may not even exist in the new salsa revision so allowing access would be confusing.
9+
* It permits the structs to be implemented using a pointer rather than a `salsa::Id`, which in turn means more efficient field access (no read locks required).
10+
11+
This section discusses the unsafe code used for pointer-based access along with the reasoning behind it. To be concrete, we'll focus on tracked structs -- interned structs are very similar.
12+
13+
## A note on UB
14+
15+
When we say in this page "users cannot do X", we mean without Undefined Behavior (e.g., by transmuting integers around etc).
16+
17+
## Proof obligations
18+
19+
Here is a typical sequence of operations for a tracked struct along with the user operations that will require us to prove unsafe assertions:
20+
21+
* A tracked function `f` executes in revision R0 and creates a tracked struct with `#[id]` fields `K` for the first time.
22+
* `K` will be stored in the interning hashmap and mapped to a fresh identifier `id`.
23+
* The identifier `id` will be used as the key in the `StructMap` and point to a freshly created allocation `alloc : Alloc`.
24+
* A `ts: TS<'db>` is created from the raw pointer `alloc` and returned to the user.
25+
* The value of the field `field` is accessed on the tracked struct instance `ts` by invoking the method `ts.field(db)`
26+
* *Unsafe:* This accesses the raw pointer to `alloc`.* A new revision R1 begins.
27+
* The tracked function `f` does not re-execute in R1.
28+
* The value of the field `field` is accessed on the tracked struct instance `ts` by invoking the method `ts.field(db)`
29+
* *Unsafe:* This accesses the raw pointer to `alloc`.* A new revision R2 begins.
30+
* The tracked function `f` does reexecute in R2 and it again creates a tracked struct with key `K` and with (Some) distinct field values.
31+
* The fields for `ts` are updated.
32+
* The value of the field `field` is accessed on the tracked struct instance `ts` by invoking the method `ts.field(db)`
33+
* *Unsafe:* This accesses the raw pointer to `alloc`.
34+
* A new revision R3 begins.
35+
* When `f` executes this time it does NOT create a tracked struct with key `K`. The tracked struct `ts` is placed in the "to be deleted" list.
36+
* A new revision R4 begins:
37+
* The allocation `alloc` is freed.
38+
39+
As noted in the list, the core "unsafe" operation that users can perform is to access the fields of a tracked struct.
40+
Tracked structs store a raw pointer to the `alloc`, owned by the ingredient, that contains their field data.
41+
Accessing the fields of a tracked struct returns a `&`-reference to fields stored in that `alloc`, which means we must ensure Rust's two core constraints are satisfied for the lifetime of that reference:
42+
43+
* The allocation `alloc` will not be freed (i.e., not be dropped)
44+
* The contents of the fields will not be mutated
45+
46+
As the sequence above illustrates, we have to show that those two constraints are true in a variety of circumstances:
47+
48+
* newly created tracked structs
49+
* tracked structs that were created in prior revisions and re-validated in this revision
50+
* tracked structs whose fields were updated in this revision
51+
* tracked structs that were *not* created in this revision
52+
53+
## Definitions
54+
55+
For every tracked struct `ts` we say that it has a **defining query** `f(..)`.
56+
This refers to a particular invocation of the tracked function `f` with a particular set of arguments `..`.
57+
This defining query is unique within a revision, meaning that `f` executes at most once with that same set of arguments.
58+
59+
We say that a query has *executed in a revision R* if its function body was executed. When this occurs, all tracked structs defined (created) by that query will be recorded along with the query's result.
60+
61+
We say that a query has been *validated in a revision R* if the salsa system determined that its inputs did not change and so skipped executing it. This also triggers the tracked structs defined by that query to be considered validated (in particular, we execute a function on them which updates some internal fields, as described below).
62+
63+
When we talk about `ts`, we mean
64+
65+
## Theorem: At the start of a new revision, all references to `ts` are within salsa's database
66+
67+
After `ts` is deleted, there may be other memoized values still reference `ts`, but they must have a red input query.
68+
**Is this true even if there are user bugs like non-deterministic functions?**
69+
Argument: yes, because of non-forgery, those memoized values could not be accessed.
70+
How did those memoized values obtain the `TS<'db>` value in the first place?
71+
It must have come from a function argument (XX: what about thread-local state).
72+
Therefore, to access the value, they would have to provide those function arguments again.
73+
But how did they get them?
74+
75+
Potential holes:
76+
77+
* Thread-local APIs that let you thread `'db` values down in an "invisible" way, so that you can return them without them showing up in your arguments -- e.g. a tracked function `() -> S<'db>` that obtains its value from thread-local state.
78+
* We might be able to sanity check against this with enough effort by defining some traits that guarantee that every lifetime tagged thing in your result *could have* come from one of your arguments, but I don't think we can prove it altogether. We either have to tell users "don't do that" or we need to have some kind of dynamic check, e.g. with a kind of versioned pointer. Note that it does require unsafe code at present but only because of the limits of our existing APIs.
79+
* Alternatively we can do a better job cleaning up deleted stuff. This we could do.
80+
* what about weird `Eq` implementations and the like? Do we have to make those unsafe?
81+
82+
## Theorem: To access a tracked struct `ts` in revision R, the defining query `f(..)` must have either *executed* or been *validated* in the revision R.
83+
84+
This is the core bit of reasoning underlying most of what follows.
85+
The idea is that users cannot "forge" a tracked struct instance `ts`.
86+
They must have gotten it through salsa's internal mechanisms.
87+
This is important because salsa will provide `&`-references to fields within that remain valid during a revision.
88+
But at the start of a new revision salsa may opt to modify those fields or even free the allocation.
89+
This is safe because users cannot have references to `ts` at the start of a new revision.
90+
91+
92+
### Lemma
93+
94+
95+
We will prove it by proceeding through the revisions in the life cycle above (this can be considered a proof by induction).
96+
97+
### Before `ts` is first created in R0
98+
99+
Users must have originally obtained `ts: TS<'db>` by invoking `TS::new(&db, ...)`.
100+
This is because creating an instance of `TS` requires providing a `NonNull<salsa::tracked_struct::ValueStruct>` pointer
101+
to an unsafe function whose contract requires the pointer's validity.
102+
103+
**FIXME:** This is not strictly true, I think the constructor is just a private tuple ctor, we should fix that.
104+
105+
### During R0
106+
107+
108+
###
109+
110+
111+
### Inductive case: Consider some revision R
112+
113+
We start by showing some circumstances that cannot occur:
114+
115+
* accessing the field of a tracked struct `ts` that was never created
116+
* accessing the field of a tracked struct `ts` after it is freed
117+
118+
### Lemma (no forgery): Users cannot forge a tracked struct
119+
120+
The first observation is that users cannot "forge" an instance of a tracked struct `ts`.
121+
They are required to produce a pointer to an `Alloc`.
122+
This implies that every tracked struct `ts` originated in the ingredient.
123+
The same is not true for input structs, for example, because they are created from integer identifiers and users could just make those up.
124+
125+
### Lemma (within one rev): Users cannot hold a tracked struct `ts` across revisions
126+
127+
The lifetime `'db` of the tracked struct `ts: TS<'db>` is created from a `db: &'db dyn Db` handle.
128+
Beginning a new revision requires an `&mut` reference.
129+
Therefore so long as users are actively using the value `ts` the database cannot start a new revision.
130+
131+
*Check:* What if users had two databases and invoked internal methods? Maybe they could then. We may have to add some assertions.
132+
133+
### Theorem: In order to get a tracked struct `ts` in revision R0, the tracked fn `f` that creates it must either *execute* or *be validated* first
134+
135+
The two points above combine to
136+
137+
138+
## Creating new values
139+
140+
Each new value is stored in a `salsa::alloc::Alloc` created by `StructMap::insert`.
141+
`Alloc` is a variant of the standard Rust `Box` that carries no uniqueness implications.
142+
This means that every tracked struct has its own allocation.
143+
This allocation is owned by the tracked struct ingredient
144+
and thus stays live until the tracked struct ingredient is dropped
145+
or until it is removed (see later for safety conditions around removal).
146+
147+
## The user type uses a raw pointer
148+
149+
The `#[salsa::tracked]` macro creates a user-exposed struct that looks roughly like this:
150+
151+
```rust
152+
// This struct is a wrapper around the actual fields that adds
153+
// some revision metadata. You can think of it as a newtype'd
154+
// version of the fields of the tracked struct.
155+
use salsa::tracked_struct::ValueStruct;
156+
157+
struct MyTrackedStruct<'db> {
158+
value: *const ValueStruct<..>,
159+
phantom: PhantomData<&'db ValueStruct<...>>
160+
}
161+
```
162+
163+
Key observations:
164+
165+
* The actual pointer to the `ValueStruct` used at runtime is not a Rust reference but a raw pointer. This is needed for stacked borrows.
166+
* A `PhantomData` is used to keep the `'db` lifetime alive.
167+
168+
The reason we use a raw pointer in the struct is because instances of this struct will outlive the `'db` lifetime. Consider this example:
169+
170+
```rust
171+
let mut db = MyDatabase::default();
172+
let input = MyInput::new(&mut db, ...);
173+
174+
// Revision 1:
175+
let result1 = tracked_fn(&db, input);
176+
177+
// Revision 2:
178+
input.set_field(&mut db).to(...);
179+
let result2 = tracked_fn(&db, input);
180+
```
181+
182+
Tracked structs created by `tracked_fn` during Revision 1
183+
may be reused during Revision 2, but the original `&db` reference
184+
used to create them has expired.
185+
If we stored a true Rust reference, that would be a violation of
186+
the stacked borrows rules.
187+
188+
Instead, we store a raw pointer and,
189+
whenever users invoke the accessor methods for particular fields,
190+
we create a new reference to the contents:
191+
192+
```rust
193+
impl<'db> MyTrackedStruct<'db> {
194+
fn field(self, db: &'db dyn DB) -> &'db FieldType {
195+
...
196+
}
197+
}
198+
```
199+
200+
This reference is linked to `db` and remains valid so long as the
201+
202+
## The `'db` lifetime at rest
203+
204+
## Updating tracked struct fields across revisions
205+
206+
### The `XX`
207+
208+
## Safety lemmas
209+
210+
These lemmas are used to justify the safety of the system.
211+
212+
### Using `MyTracked<'db>` within some revision R always "happens after' a call to `MyTracked::new`
213+
214+
Whenever a tracked struct instance `TS<'db>` is created for the first time in revision R1,
215+
the result is a fresh allocation and hence there cannot be any
216+
pre-existing aliases of that struct.
217+
218+
`TS<'db>` will at that time be stored into the salsa database.
219+
In later revisions, we assert that
220+
221+
### `&'db T` references are never stored in the database
222+
223+
224+
We maintain the invariant that, in any later revision R2,
225+
226+
However in some later revision R2, how
227+
228+
## Ways this could go wrong and how we prevent them
229+
230+
###
231+
232+
### Storing an `&'db T` into a field
233+
234+
235+
### Freeing the memory while a tracked struct remains live
236+
237+
238+
### Aliases of a tracked struct

book/src/plumbing/tracked_structs.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ For a single tracked struct we create multiple ingredients.
1010
The **tracked struct ingredient** is the ingredient created first.
1111
It offers methods to create new instances of the struct and therefore
1212
has unique access to the interner and hashtables used to create the struct id.
13-
It also shares access to a hashtable that stores the `TrackedStructValue` that
13+
It also shares access to a hashtable that stores the `ValueStruct` that
1414
contains the field data.
1515

1616
For each field, we create a **tracked field ingredient** that moderates access
1717
to a particular field. All of these ingredients use that same shared hashtable
18-
to access the `TrackedStructValue` instance for a given id. The `TrackedStructValue`
18+
to access the `ValueStruct` instance for a given id. The `ValueStruct`
1919
contains both the field values but also the revisions when they last changed value.
2020

2121
## Each tracked struct has a globally unique id
@@ -26,13 +26,13 @@ This will begin by creating a *globally unique, 32-bit id* for the tracked struc
2626
* a u64 hash of the `#[id]` fields;
2727
* a *disambiguator* that makes this hash unique within the current query. i.e., when a query starts executing, it creates an empty map, and the first time a tracked struct with a given hash is created, it gets disambiguator 0. The next one will be given 1, etc.
2828

29-
## Each tracked struct has a `TrackedStructValue` storing its data
29+
## Each tracked struct has a `ValueStruct` storing its data
3030

3131
The struct and field ingredients share access to a hashmap that maps
3232
each field id to a value struct:
3333

3434
```rust,ignore
35-
{{#include ../../../components/salsa-2022/src/tracked_struct.rs:TrackedStructValue}}
35+
{{#include ../../../components/salsa-2022/src/tracked_struct.rs:ValueStruct}}
3636
```
3737

3838
The value struct stores the values of the fields but also the revisions when

book/src/reference/durability.md

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Durability
2+
3+
"Durability" is an optimization that can greatly improve the performance of your salsa programs.
4+
Durability specifies the probability that an input's value will change.
5+
The default is "low durability".
6+
But when you set the value of an input, you can manually specify a higher durability,
7+
typically `Durability::HIGH`.
8+
Salsa tracks when tracked functions only consume values of high durability
9+
and, if no high durability input has changed, it can skip traversing their
10+
dependencies.
11+
12+
Typically "high durability" values are things like data read from the standard library
13+
or other inputs that aren't actively being edited by the end user.

0 commit comments

Comments
 (0)