fix(perf-issues): Use parameterized URL for similarities on N+1 API Calls #89362

leeandher · 2025-04-10T23:59:00Z

Sanity check here, but I believe these parameterized URLs are already being used for fingerprinting, but they are not being used for checking similarity. Instead we are checking hashes from span_hashes which is generated from a pinned function which does not attempt to parameterize, meaning the hash of /posts/1/comments/ is not going to match /posts/2/comments/.

Rather than hashing the URLs, we can compare them directly after parameterization, in order to increase how often we can fingerprint. Added tests for this new behaviour and adjusted some previous ones.

As per the recommendation, I think I'm going to have this shadow for a while before actually implementing this to ensure we don't cause any fingerprint explosions.

This will allow for N+1 API calls to trigger when we process a transaction that does something like:

for (let i = 0; i < 20; i++) { 
  await fetch(`https://example.com/posts/${i}/`)
}

rather than just for query params like:

for (let i = 0; i < 20; i++) { 
  await fetch(`https://example.com/posts/?id=${i}`)
}

src/sentry/utils/performance_issues/detectors/n_plus_one_api_calls_detector.py

codecov · 2025-04-11T00:16:35Z

Codecov Report

Attention: Patch coverage is 97.10145% with 2 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
..._issues/detectors/n_plus_one_api_calls_detector.py	88.88%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #89362      +/-   ##
==========================================
- Coverage   85.26%   85.26%   -0.01%     
==========================================
  Files       10172    10172              
  Lines      574286   574301      +15     
  Branches    22636    22636              
==========================================
+ Hits       489671   489680       +9     
- Misses      84070    84076       +6     
  Partials      545      545

gggritso · 2025-04-11T19:20:07Z

🤔 This looks pretty good to me! Interesting find, for sure.

I don't remember why the span similarity is done using hashes rather than parameterized URLs. I'm guessing it's something that slipped through the cracks in #43506 and I never noticed. Comparing the parameterized URLs seems okay to me, it's just hard to predict exactly what'll happen, and I'm paranoid. I think this is good?

re: span["hash"]. Here is how/when they're set:

sentry/src/sentry/event_manager.py

Line 2451 in 321dc2c

groupings.write_to_event(event.data)

I think you'd have to be pretty careful about changing those. N+1 API Calls won't be affected since it makes its own hashes for fingerprints, but other detectors might rely on span["hash"] for fingerprints, and if the hashes change everyone's going to get new issues. Should be easy to check, though!

These span hashes are also used for the "suspect spans" feature, though I don't know if that feature is going to stick around

Lastly, I'm not sure if you're worried about rolling this change out (I'm not sure if you need to be, but maybe?) but there's always a risk of creating a bunch of unwanted issues for people.

If you want to be rigorous, what we've done in the past is copy/paste a detector, register it, run it silently for a while in parallel with the original and check the output via telemetry. If the output looks good, then you can add those changes to the current detector.

I'm not saying this PR needs that, but it's worth considering.

leeandher · 2025-04-23T15:05:37Z

Closing in favour of #90070

…90070) For context, I'm auditing the current performance issue detectors and will likely be making some changes to them. Risky changes like touching fingerprinting should probably run silently beforehand to ensure we catch false positives or fingerprint explosion (see [guidance](#89362 (comment))). In this PR, I'm creating a structure to make these experiments easier to review and colocated since I'll be working on different detectors in parallel. Going forward, my workflow will probably be the following: 1. Copy the detector and test files into `/experiments` and set up a dummy group type. This copy will be identical to the current detector. (1st PR, This one) 2. Apply experimental changes in a follow up PR (2nd PR, making it easier to review) 3. Add the options via `sentry createissueflag` ([docs](https://develop.sentry.dev/backend/issue-platform/#releasing-your-issue-type)) to run in prod (3rd PR, in `options-automator`) 5. If needed, follow up with a fix/change to the experimental detector. Repeat as needed 6. When happy with the output, apply changes to the real detector and delete experiment files. The reason I'm being so formal about this all is because I'll be parallelizing these experiments so I gotta keep this straight for myself as well 👍

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 10, 2025

leeandher commented Apr 11, 2025

View reviewed changes

src/sentry/utils/performance_issues/detectors/n_plus_one_api_calls_detector.py Outdated Show resolved Hide resolved

leeandher force-pushed the leander/alter-n-plus-one-api-call-hashing branch from f2f114f to b4b185a Compare April 14, 2025 17:46

vercel bot deployed to Preview April 14, 2025 17:51 View deployment

vercel bot deployed to Preview April 14, 2025 21:01 View deployment

leeandher added 2 commits April 15, 2025 12:02

🚧 Use the parameterized URL for checking similarity

9710589

♻️ better evidence data, tests

749da92

leeandher force-pushed the leander/alter-n-plus-one-api-call-hashing branch from fca2156 to 749da92 Compare April 15, 2025 16:02

vercel bot deployed to Preview April 15, 2025 16:08 View deployment

This was referenced Apr 15, 2025

chore(issues): Setup experiemental N+1 API Call detector #89633

Closed

chore(performance-issues): Create experimental N+1 API Call detector #90070

Merged

leeandher closed this Apr 23, 2025

leeandher deleted the leander/alter-n-plus-one-api-call-hashing branch April 23, 2025 15:13

github-actions bot locked and limited conversation to collaborators May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perf-issues): Use parameterized URL for similarities on N+1 API Calls #89362

fix(perf-issues): Use parameterized URL for similarities on N+1 API Calls #89362

leeandher commented Apr 10, 2025 •

edited

Loading

codecov bot commented Apr 11, 2025 •

edited

Loading

gggritso commented Apr 11, 2025

leeandher commented Apr 23, 2025

fix(perf-issues): Use parameterized URL for similarities on N+1 API Calls #89362

fix(perf-issues): Use parameterized URL for similarities on N+1 API Calls #89362

Conversation

leeandher commented Apr 10, 2025 • edited Loading

codecov bot commented Apr 11, 2025 • edited Loading

Codecov Report

gggritso commented Apr 11, 2025

leeandher commented Apr 23, 2025

leeandher commented Apr 10, 2025 •

edited

Loading

codecov bot commented Apr 11, 2025 •

edited

Loading