-
Notifications
You must be signed in to change notification settings - Fork 170
allow reuse of cached provisional memos within the same cycle iteration #786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances memoization by allowing the reuse of cached provisional memos within the same cycle iteration, preventing unnecessary re-execution of queries during cycle iterations. The key changes include:
- Updating the cycle head data structure to include an iteration count.
- Modifying functions such as push_query and reset_for to accept and propagate the iteration count.
- Adding tests to verify the reuse of provisional memos and adjusting cycle log expectations.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
tests/cycle_output.rs | Removed a duplicate log entry to align test outputs. |
tests/cycle_accumulate.rs | Removed a duplicate test check to avoid redundant assertions. |
tests/cycle.rs | Adjusted log length expectations and added a new test for provisional memo reuse. |
src/zalsa_local.rs | Updated push_query signature to include iteration count. |
src/function/memo.rs | Revised cycle head filtering to compare the inner database_key_index. |
src/function/maybe_changed_after.rs | Updated calls to CycleHeads and added validate_same_iteration. |
src/function/fetch.rs | Integrated the new memo validation and updated push_query calls. |
src/function/execute.rs | Updated cycle head iteration updates and push_query calls. |
src/function.rs | Changed cycle head membership test to iterate over CycleHead. |
src/cycle.rs | Redesigned CycleHeads to hold CycleHead (with iteration_count) and added update/extend logic. |
src/active_query.rs | Added iteration_count to ActiveQuery and updated related functions. |
Comments suppressed due to low confidence (1)
src/cycle.rs:174
- Review the assert comparing iteration counts in insert_into_impl; if differing iteration counts are ever expected when merging cycle heads, consider implementing a merging strategy or a more informative error instead of a hard assertion.
assert!(existing.iteration_count == head.iteration_count);
CodSpeed Performance ReportMerging #786 will not alter performanceComparing Summary
|
The last 3 commits on this PR represent 3 different viable working fixes, which should be evaluated against each other on maintainability and performance.
All 3 different versions are showing a small regression on a couple Salsa benchmarks. Oddly, the benchmarks showing regression are not the one with cycles, but this fix should have little or no impact when there are not cycles. This suggests the regression might be from compilation, e.g. an inlining change? 072ada7 reported the smallest regression. I've tried all 3 of these commits in release builds of red-knot, on a module with a cycle that previously appeared to hang (due to runaway re-execution) before this fix. The performance differences observed between the commits are small (3-4%) and not consistent: they seem to tend to report whichever version hyperfine runs first or second as slightly faster. Regarding maintainability, I think the options that walk the active query stack are better; correctly maintaining Based on this, I'm currently proposing 072ada7, which is the simplest version and shows the least regression on Salsa benchmarks, and no worse than any other version on the red-knot test. I welcome reviewer feedback here (and/or other suggestions for how to reclaim some perf.) |
I believe we are actually seeing allocator noise. You are changing the size of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your choice of which strategy to prefer here 👍
#[derive(Clone, Debug, Default)] | ||
#[allow(clippy::box_collection)] | ||
pub struct CycleHeads(Option<Box<Vec<DatabaseKeyIndex>>>); | ||
pub struct CycleHeads(Option<Box<Vec<CycleHead>>>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR (but something I just noticed we could investigate in a separate PR), we could check if using ThinVec
here might be better than a Box<Vec<_>>
(I think we have that dep nowadays)
Update to latest Salsa main branch, so as to get a baseline for measuring the perf effect of salsa-rs/salsa#786 on red-knot in isolation from other recent changes in Salsa main branch.
The Salsa benchmarks seem very noisy here and not useful. I just pushed two small changes (iterating active query stack in reverse, and making |
Also note that 1.86 released earlier today. So until you rebase onto latest main ( #781 ) you will likely bench between different rust versions. |
Rebased on latest main. |
Currently both Salsa and red-knot benchmarks are suggesting around a 1% regression here. I'm going to go ahead and merge; we can explore optimizations separately. |
Update to latest Salsa main branch, so as to get a baseline for measuring the perf effect of salsa-rs/salsa#786 on red-knot in isolation from other recent changes in Salsa main branch.
Currently, we don't allow provisional memos from cycle iteration to be reused as a memoized result. This is important in case another thread "peeks" in while we are still iterating a cycle (it shouldn't see a provisional result), and also so that iteration N+1 of a cycle doesn't reuse results from iteration N; it needs to recompute instead.
But this also means that if the same query is called many times within a single iteration of a cycle, we will re-execute it many times. For example, if query A calls query B, and query B calls query C 1000 times, and query C calls query A, each time we iterate this cycle we will execute query C 1000 times. This can lead to massive blow-up in execution times.
To fix this, we need to record in
cycle_heads
of a provisional memo the iteration-count for each cycle in which this memo was recorded. When validating whether a provisional memo can be used by the caller, we must allow it to be used only if we are still within the same iteration of those cycles, and not otherwise.This PR fixes an execution-time blow-up on red-knot that was bad enough that it looked like a hang.