Skip to content

fix(seer grouping): Store events with unusable Seer matches in Seer #89318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lobsterkatie
Copy link
Member

When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*

Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events.

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do and the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.

*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 10, 2025
@lobsterkatie lobsterkatie marked this pull request as ready for review April 10, 2025 19:04
@lobsterkatie lobsterkatie requested review from a team as code owners April 10, 2025 19:04
@lobsterkatie lobsterkatie merged commit 682d434 into master Apr 10, 2025
60 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-send-event-back-to-seer-if-matches-unusable branch April 10, 2025 19:08
Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/seer/similarity/similar_issues.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #89318       +/-   ##
===========================================
+ Coverage   63.57%   87.72%   +24.14%     
===========================================
  Files        4571    10125     +5554     
  Lines      137842   572496   +434654     
  Branches    22482    22482               
===========================================
+ Hits        87632   502201   +414569     
- Misses      49774    69859    +20085     
  Partials      436      436               

adrian-codecov pushed a commit that referenced this pull request Apr 10, 2025
…89318)

When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*
  
Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. 

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.


*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
@JoshFerge JoshFerge added the Trigger: Revert Add to a merged PR to revert it (skips CI) label Apr 10, 2025
@JoshFerge
Copy link
Member

FYI: reverting due to suspected increase in database insert durations. need to determine root cause

@getsentry-bot
Copy link
Contributor

PR reverted: da65e5f

getsentry-bot added a commit that referenced this pull request Apr 10, 2025
…n Seer (#89318)"

This reverts commit 682d434.

Co-authored-by: JoshFerge <1976777+JoshFerge@users.noreply.github.com>
@lobsterkatie
Copy link
Member Author

Redone in #89563.

lobsterkatie added a commit that referenced this pull request Apr 15, 2025
…take 2) (#89563)

This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, `seer.similarity.ingest.store_hybrid_fingerprint_non_matches`. The option defaults to `True`, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary.

The original description is copied below for reference.

-----

When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*
  
Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. 

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.


*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
…89318)

When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*
  
Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. 

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.


*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
…n Seer (#89318)"

This reverts commit 682d434.

Co-authored-by: JoshFerge <1976777+JoshFerge@users.noreply.github.com>
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
…take 2) (#89563)

This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, `seer.similarity.ingest.store_hybrid_fingerprint_non_matches`. The option defaults to `True`, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary.

The original description is copied below for reference.

-----

When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*
  
Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. 

If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.

This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.


*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
@github-actions github-actions bot locked and limited conversation to collaborators Apr 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components Trigger: Revert Add to a merged PR to revert it (skips CI)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants