Skip to content

allow reuse of cached provisional memos within the same cycle iteration #786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 3, 2025

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Apr 2, 2025

Currently, we don't allow provisional memos from cycle iteration to be reused as a memoized result. This is important in case another thread "peeks" in while we are still iterating a cycle (it shouldn't see a provisional result), and also so that iteration N+1 of a cycle doesn't reuse results from iteration N; it needs to recompute instead.

But this also means that if the same query is called many times within a single iteration of a cycle, we will re-execute it many times. For example, if query A calls query B, and query B calls query C 1000 times, and query C calls query A, each time we iterate this cycle we will execute query C 1000 times. This can lead to massive blow-up in execution times.

To fix this, we need to record in cycle_heads of a provisional memo the iteration-count for each cycle in which this memo was recorded. When validating whether a provisional memo can be used by the caller, we must allow it to be used only if we are still within the same iteration of those cycles, and not otherwise.

This PR fixes an execution-time blow-up on red-knot that was bad enough that it looked like a hang.

Copy link

netlify bot commented Apr 2, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 1d0f7e7
🔍 Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/67eea2e96a44e80008e73193

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances memoization by allowing the reuse of cached provisional memos within the same cycle iteration, preventing unnecessary re-execution of queries during cycle iterations. The key changes include:

  • Updating the cycle head data structure to include an iteration count.
  • Modifying functions such as push_query and reset_for to accept and propagate the iteration count.
  • Adding tests to verify the reuse of provisional memos and adjusting cycle log expectations.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/cycle_output.rs Removed a duplicate log entry to align test outputs.
tests/cycle_accumulate.rs Removed a duplicate test check to avoid redundant assertions.
tests/cycle.rs Adjusted log length expectations and added a new test for provisional memo reuse.
src/zalsa_local.rs Updated push_query signature to include iteration count.
src/function/memo.rs Revised cycle head filtering to compare the inner database_key_index.
src/function/maybe_changed_after.rs Updated calls to CycleHeads and added validate_same_iteration.
src/function/fetch.rs Integrated the new memo validation and updated push_query calls.
src/function/execute.rs Updated cycle head iteration updates and push_query calls.
src/function.rs Changed cycle head membership test to iterate over CycleHead.
src/cycle.rs Redesigned CycleHeads to hold CycleHead (with iteration_count) and added update/extend logic.
src/active_query.rs Added iteration_count to ActiveQuery and updated related functions.
Comments suppressed due to low confidence (1)

src/cycle.rs:174

  • Review the assert comparing iteration counts in insert_into_impl; if differing iteration counts are ever expected when merging cycle heads, consider implementing a merging strategy or a more informative error instead of a hard assertion.
assert!(existing.iteration_count == head.iteration_count);

Copy link

codspeed-hq bot commented Apr 2, 2025

CodSpeed Performance Report

Merging #786 will not alter performance

Comparing carljm:cacheprovisional (1d0f7e7) with master (395b29d)

Summary

✅ 12 untouched benchmarks

@carljm
Copy link
Contributor Author

carljm commented Apr 2, 2025

The last 3 commits on this PR represent 3 different viable working fixes, which should be evaluated against each other on maintainability and performance.

  • The most recent commit (072ada7) walks the active query stack once per cycle head in a provisional memo, to check if each cycle head is present in the active query stack.
  • The second-to-last commit (484d6dd) only walks the active query stack once, but iterates the provisional memo's cycle heads once per entry in the active query stack.
  • The 3rd-from-last commit (4ea3d85) instead tracks the currently-iterating cycles in a new active_cycles field on ZalsaLocal, and just checks whether that field contains all cycle heads in the memo.

All 3 different versions are showing a small regression on a couple Salsa benchmarks. Oddly, the benchmarks showing regression are not the one with cycles, but this fix should have little or no impact when there are not cycles. This suggests the regression might be from compilation, e.g. an inlining change? 072ada7 reported the smallest regression.

I've tried all 3 of these commits in release builds of red-knot, on a module with a cycle that previously appeared to hang (due to runaway re-execution) before this fix. The performance differences observed between the commits are small (3-4%) and not consistent: they seem to tend to report whichever version hyperfine runs first or second as slightly faster.

Regarding maintainability, I think the options that walk the active query stack are better; correctly maintaining ZalsaLocal::active_cycles is a bit finicky and took me a while to get right (and I think it's still not correct in the face of cancellation). Walking the active query stack is a more reliable and easy-to-maintain way to know which cycle heads are currently executing.

Based on this, I'm currently proposing 072ada7, which is the simplest version and shows the least regression on Salsa benchmarks, and no worse than any other version on the red-knot test.

I welcome reviewer feedback here (and/or other suggestions for how to reclaim some perf.)

@carljm carljm requested a review from nikomatsakis April 2, 2025 23:56
@Veykril
Copy link
Member

Veykril commented Apr 3, 2025

Oddly, the benchmarks showing regression are not the one with cycles, but this fix should have little or no impact when there are not cycles. This suggests the regression might be from compilation, e.g. an inlining change?

I believe we are actually seeing allocator noise. You are changing the size of the ActiveQuery (I think) which is heap allocated so this hits the allocator differently (and regressions in those benches are also mostly in allocating paths). I've noticed this in a couple PRs that affect allocation size and/or order.
Either way we can ignore those regressions I think, they look very heap allocation dependent here.

Copy link
Member

@Veykril Veykril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your choice of which strategy to prefer here 👍

#[derive(Clone, Debug, Default)]
#[allow(clippy::box_collection)]
pub struct CycleHeads(Option<Box<Vec<DatabaseKeyIndex>>>);
pub struct CycleHeads(Option<Box<Vec<CycleHead>>>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR (but something I just noticed we could investigate in a separate PR), we could check if using ThinVec here might be better than a Box<Vec<_>> (I think we have that dep nowadays)

carljm added a commit to astral-sh/ruff that referenced this pull request Apr 3, 2025
Update to latest Salsa main branch, so as to get a baseline for
measuring the perf effect of salsa-rs/salsa#786
on red-knot in isolation from other recent changes in Salsa main branch.
@carljm
Copy link
Contributor Author

carljm commented Apr 3, 2025

The Salsa benchmarks seem very noisy here and not useful. I just pushed two small changes (iterating active query stack in reverse, and making ActiveQuery::iteration_count private with accessor) which should not impact performance of any benchmark other than converge_diverge (the one with cycles, which already showed no impact), and now the perf regression on those other two input benchmarks is gone. I think this supports @Veykril theory that it's allocator noise and not really meaningful.

@Veykril
Copy link
Member

Veykril commented Apr 3, 2025

Also note that 1.86 released earlier today. So until you rebase onto latest main ( #781 ) you will likely bench between different rust versions.

@carljm carljm force-pushed the cacheprovisional branch from 13d8d89 to 05bbbd3 Compare April 3, 2025 14:41
@carljm
Copy link
Contributor Author

carljm commented Apr 3, 2025

Rebased on latest main.

@carljm
Copy link
Contributor Author

carljm commented Apr 3, 2025

Currently both Salsa and red-knot benchmarks are suggesting around a 1% regression here. I'm going to go ahead and merge; we can explore optimizations separately.

@carljm carljm enabled auto-merge April 3, 2025 15:10
@carljm carljm added this pull request to the merge queue Apr 3, 2025
Merged via the queue into salsa-rs:master with commit 296a8c7 Apr 3, 2025
11 checks passed
@carljm carljm deleted the cacheprovisional branch April 3, 2025 15:20
maxmynter pushed a commit to maxmynter/ruff that referenced this pull request Apr 3, 2025
Update to latest Salsa main branch, so as to get a baseline for
measuring the perf effect of salsa-rs/salsa#786
on red-knot in isolation from other recent changes in Salsa main branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants