fix(seer grouping): Store events with unusable Seer matches in Seer #89318

lobsterkatie · 2025-04-10T18:37:45Z

When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*

Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events.

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do and the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.

*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.

codecov · 2025-04-10T19:08:23Z

Codecov Report

Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/seer/similarity/similar_issues.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #89318       +/-   ##
===========================================
+ Coverage   63.57%   87.72%   +24.14%     
===========================================
  Files        4571    10125     +5554     
  Lines      137842   572496   +434654     
  Branches    22482    22482               
===========================================
+ Hits        87632   502201   +414569     
- Misses      49774    69859    +20085     
  Partials      436      436

…89318) When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.

JoshFerge · 2025-04-10T23:46:15Z

FYI: reverting due to suspected increase in database insert durations. need to determine root cause

getsentry-bot · 2025-04-10T23:46:21Z

PR reverted: da65e5f

…n Seer (#89318)" This reverts commit 682d434. Co-authored-by: JoshFerge <1976777+JoshFerge@users.noreply.github.com>

lobsterkatie · 2025-04-14T23:17:06Z

Redone in #89563.

…take 2) (#89563) This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, `seer.similarity.ingest.store_hybrid_fingerprint_non_matches`. The option defaults to `True`, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary. The original description is copied below for reference. ----- When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.

…89318) When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.

…n Seer (#89318)" This reverts commit 682d434. Co-authored-by: JoshFerge <1976777+JoshFerge@users.noreply.github.com>

…take 2) (#89563) This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, `seer.similarity.ingest.store_hybrid_fingerprint_non_matches`. The option defaults to `True`, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary. The original description is copied below for reference. ----- When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.

lobsterkatie added 3 commits April 10, 2025 09:29

make second request to Seer if results are unusable

29c0ecd

add logs to confirm behavior

84e2aa2

add test

807e89a

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 10, 2025

vercel bot deployed to Preview April 10, 2025 18:41 View deployment

lobsterkatie marked this pull request as ready for review April 10, 2025 19:04

lobsterkatie requested review from a team as code owners April 10, 2025 19:04

JoshFerge approved these changes Apr 10, 2025

View reviewed changes

lobsterkatie merged commit 682d434 into master Apr 10, 2025
60 checks passed

lobsterkatie deleted the kmclb-send-event-back-to-seer-if-matches-unusable branch April 10, 2025 19:08

JoshFerge added the Trigger: Revert Add to a merged PR to revert it (skips CI) label Apr 10, 2025

getsentry-bot added a commit that referenced this pull request Apr 10, 2025

Revert "fix(seer grouping): Store events with unusable Seer matches i…

da65e5f

…n Seer (#89318)" This reverts commit 682d434. Co-authored-by: JoshFerge <1976777+JoshFerge@users.noreply.github.com>

lobsterkatie mentioned this pull request Apr 14, 2025

fix(seer grouping): Store events with unusable Seer matches in Seer (take 2) #89563

Merged

andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025

Revert "fix(seer grouping): Store events with unusable Seer matches i…

2033b5a

…n Seer (#89318)" This reverts commit 682d434. Co-authored-by: JoshFerge <1976777+JoshFerge@users.noreply.github.com>

github-actions bot locked and limited conversation to collaborators Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(seer grouping): Store events with unusable Seer matches in Seer #89318

fix(seer grouping): Store events with unusable Seer matches in Seer #89318

lobsterkatie commented Apr 10, 2025

codecov bot commented Apr 10, 2025 •

edited

Loading

JoshFerge commented Apr 10, 2025

getsentry-bot commented Apr 10, 2025

lobsterkatie commented Apr 14, 2025

fix(seer grouping): Store events with unusable Seer matches in Seer #89318

fix(seer grouping): Store events with unusable Seer matches in Seer #89318

Conversation

lobsterkatie commented Apr 10, 2025

codecov bot commented Apr 10, 2025 • edited Loading

Codecov Report

JoshFerge commented Apr 10, 2025

getsentry-bot commented Apr 10, 2025

lobsterkatie commented Apr 14, 2025

codecov bot commented Apr 10, 2025 •

edited

Loading