Skip to content

fix(seer grouping): Store events with unusable Seer matches in Seer (take 2) #89563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lobsterkatie
Copy link
Member

This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, seer.similarity.ingest.store_hybrid_fingerprint_non_matches. The option defaults to True, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary.

The original description is copied below for reference.


When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*

Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events.

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do and the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.

*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 14, 2025
Copy link

codecov bot commented Apr 14, 2025

Codecov Report

Attention: Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/seer/similarity/similar_issues.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #89563   +/-   ##
=======================================
  Coverage   87.73%   87.73%           
=======================================
  Files       10171    10171           
  Lines      574093   574132   +39     
  Branches    22614    22614           
=======================================
+ Hits       503678   503721   +43     
+ Misses      69999    69995    -4     
  Partials      416      416           

@lobsterkatie lobsterkatie merged commit 6d26015 into master Apr 15, 2025
61 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-send-event-back-to-seer-if-matches-unusable-take-2 branch April 15, 2025 19:04
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
…take 2) (#89563)

This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, `seer.similarity.ingest.store_hybrid_fingerprint_non_matches`. The option defaults to `True`, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary.

The original description is copied below for reference.

-----

When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*
  
Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. 

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.


*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
@github-actions github-actions bot locked and limited conversation to collaborators May 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants