-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
fix(seer grouping): Store events with unusable Seer matches in Seer #89318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lobsterkatie
merged 3 commits into
master
from
kmclb-send-event-back-to-seer-if-matches-unusable
Apr 10, 2025
Merged
fix(seer grouping): Store events with unusable Seer matches in Seer #89318
lobsterkatie
merged 3 commits into
master
from
kmclb-send-event-back-to-seer-if-matches-unusable
Apr 10, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
JoshFerge
approved these changes
Apr 10, 2025
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #89318 +/- ##
===========================================
+ Coverage 63.57% 87.72% +24.14%
===========================================
Files 4571 10125 +5554
Lines 137842 572496 +434654
Branches 22482 22482
===========================================
+ Hits 87632 502201 +414569
- Misses 49774 69859 +20085
Partials 436 436 |
adrian-codecov
pushed a commit
that referenced
this pull request
Apr 10, 2025
…89318) When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
FYI: reverting due to suspected increase in database insert durations. need to determine root cause |
PR reverted: da65e5f |
Redone in #89563. |
lobsterkatie
added a commit
that referenced
this pull request
Apr 15, 2025
…take 2) (#89563) This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, `seer.similarity.ingest.store_hybrid_fingerprint_non_matches`. The option defaults to `True`, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary. The original description is copied below for reference. ----- When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
andrewshie-sentry
pushed a commit
that referenced
this pull request
Apr 22, 2025
…89318) When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
andrewshie-sentry
pushed a commit
that referenced
this pull request
Apr 22, 2025
…take 2) (#89563) This is a redo of #89318, which got reverted in the process of debugging some Seer latency issues. No changes from the original version, except that now the behavior is gated by an option, `seer.similarity.ingest.store_hybrid_fingerprint_non_matches`. The option defaults to `True`, so the effect of merging this will be the same as before, but we'll have an easier, faster way to turn it off this time, should we decide that's necessary. The original description is copied below for reference. ----- When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.* Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events. If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do _and_ the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database. This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway. *In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
Scope: Backend
Automatically applied to PRs that change backend components
Trigger: Revert
Add to a merged PR to revert it (skips CI)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we send an event to Seer during ingest, if Seer doesn't find a match, it stores the event's data in its database, so that future events can match with it. Conversely, if Seer does find a match, it doesn't store the new event's data, under the presumption that any future events which would match it would also match the event it matched.*
Now that we're sending events with hybrid fingerprints to Seer, though, that logic breaks down, since an event being a stacktrace distance match doesn't guarantee that it's an overall match, because either or both could have a hybrid fingerprint, and the non-default parts of those potential hybrid fingerprints might or might not match. In other words, we can no longer assume that the event already in Seer and the new event are equivalent in terms of matching to future events.
If neither the current event nor the existing event in Seer has a hybrid fingerprint, or if they both do and the non-default parts of their fingerprints match, the assumption holds. But if neither of those is the case, future events could match one but not the other, meaning both need to be represented in the Seer database.
This PR thus adds a second request to Seer to store the new event in the case where none of the results returned by Seer are an overall match. In order to force the event to be stored, the request sets the desired number of matches to 0, so that Seer sees the event as one with no matches. It also turns off re-ranking, in order to prevent Seer from pulling an initial list of matches to whittle down, since we know we're not going to use any of them anyway.
*In terms of stacktrace distance, this is not perfectly theoretically true - something just barely close enough to the new event to count might be just slightly too far from the existing event - but in practice those corner cases are rare enough that we don't worry about them.