feat(workflow_engine): Add state tracking for priority levels for stateful detectors #91791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

saponifi3d merged 16 commits into master from jcallender/aci/ref-stateful-priorities

May 21, 2025

Contributor

saponifi3d commented May 16, 2025 •

edited

Loading

Description

Implement state tracking for threshold changes on detectors.

This is pt 4 of the great stateful detector refactoring. Taking a moment to also document these changes. For now, adding these as inline markdown. (Not sure if we have a way to do docgen from comment headers or not)

Up next, - the final refactoring - make the Stateful handler easier to reckon with grouping.

vercel bot deployed to Preview

May 16, 2025 05:47

View deployment

github-actions bot added the Scope: Backend label

vercel bot deployed to Preview

May 16, 2025 06:06

View deployment

vercel bot deployed to Preview

May 16, 2025 06:11

View deployment

saponifi3d marked this pull request as ready for review

May 16, 2025 06:21

saponifi3d requested a review from a team as a code owner

May 16, 2025 06:21

codecov bot commented May 16, 2025 •

edited

Loading

Codecov Report

Attention: Patch coverage is 94.21053% with 11 lines in your changes missing coverage. Please review.

⚠️ Parser warning

The parser emitted a warning. Please review your JUnit XML file:

Warning while parsing testcase attributes: Limit of string is 1000 chars, for name, we got 2083 at 1:157236 in /home/runner/work/sentry/sentry/.artifacts/pytest.junit.xml

Files with missing lines	Patch %	Lines
...ntry/workflow_engine/handlers/detector/stateful.py	83.33%	11 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #91791    +/-   ##
========================================
  Coverage   87.63%   87.63%            
========================================
  Files       10356    10357     +1     
  Lines      587192   587355   +163     
  Branches    22585    22585            
========================================
+ Hits       514558   514712   +154     
- Misses      72206    72215     +9     
  Partials      428      428

ceorourke reviewed

View reviewed changes

tests/sentry/workflow_engine/handlers/detector/test_stateful.py Show resolved Hide resolved

ceorourke approved these changes

View reviewed changes

Member

ceorourke left a comment

changes lgtm, regarding documentation I think we have something for public facing documentation generation but not internal. Typically we just do big docstrings but this is fine as long as people find it

vercel bot deployed to Preview

May 16, 2025 17:11

View deployment

saponifi3d requested a review from evanpurkhiser

May 16, 2025 17:16

saponifi3d assigned wedamija

wedamija reviewed

View reviewed changes

src/sentry/workflow_engine/handlers/detector/stateful.py Outdated

-                      storing counter values.
-                      """
-                      pass
+                      # Merge all kinds of counters together for the state manager

Member

wedamija May 16, 2025

Nit: These are already merged in self._thresholds

Contributor Author

saponifi3d May 16, 2025

The way i was thinking about this is that there might be additional counters that the implementer wants to track in their detector handler. this is meant to give the space to allow for that override fairly easily -- but yeah, i feel like this is probs an over optimization, i can simplify to just having threshold counters on the stateful detector for now.

src/sentry/workflow_engine/handlers/detector/stateful.py Outdated

+                          **self._thresholds,
+                      }
+                      self.state_manager = DetectorStateManager(detector, list(counters.keys()))

Member

wedamija May 16, 2025

Should we just use list(self._thresholds.keys()) here?

src/sentry/workflow_engine/handlers/detector/stateful.py Outdated Show resolved Hide resolved

src/sentry/workflow_engine/handlers/detector/stateful.py Outdated

Comment on lines 385 to 443

+                      updated_status_count = (state.counter_updates.get(new_priority) or 0) + 1
+                      self.state_manager.enqueue_counter_update(None, {new_priority: updated_status_count})
+                      if self._thresholds[new_priority] > updated_status_count:
+                          # We haven't met the threshold yet, so don't trigger
+                          return None

Member

wedamija May 16, 2025

One thing to keep in mind - probably, higher priorities should also increment counts for all lower priorities.

For metric alerts, if we had a threshold of 3, and then we got crit, crit, warn, the warning should fire.

Contributor Author

saponifi3d May 16, 2025

🤔 makes sense. will update.

evanpurkhiser mentioned this pull request

feat(workflow_engine): Implement priority thresholds #91106

Closed

saponifi3d force-pushed the jcallender/aci/ref-stateful-priorities branch from 16408ae to 9b9e176 Compare

May 19, 2025 19:02

vercel bot deployed to Preview

May 19, 2025 19:07

View deployment

vercel bot deployed to Preview

May 19, 2025 23:34

View deployment

vercel bot deployed to Preview

May 19, 2025 23:40

View deployment

saponifi3d commented

View reviewed changes

src/sentry/workflow_engine/handlers/detector/stateful.py

Comment on lines +293 to +306

+                  def _get_configured_detector_levels(self) -> list[DetectorPriorityLevel]:
+                      # TODO - Is this something that should be provided by the detector itself rather
+                      # than having to query the db for each level?
+                      priority_levels: list[DetectorPriorityLevel] = [level for level in DetectorPriorityLevel]
+                      if self.detector.workflow_condition_group is None:
+                          # TODO - Should this default to _all_ levels or no levels?
+                          return priority_levels
+                      condition_result_levels = self.detector.workflow_condition_group.conditions.filter(
+                          condition_result__in=priority_levels
+                      ).values_list("condition_result", flat=True)
+                      return list(DetectorPriorityLevel(level) for level in condition_result_levels)

Contributor Author

saponifi3d May 19, 2025

@wedamija this is how i was thinking we could filter to just the levels configure -- any ways we can easily cache this data on the model or should i make a field in the model that looks up all these values and caches it there? 🤔

saponifi3d requested a review from wedamija

May 19, 2025 23:50

saponifi3d force-pushed the jcallender/aci/ref-stateful-priorities branch from 58fe124 to 3a9cf30 Compare

May 20, 2025 00:30

saponifi3d added 9 commits

May 19, 2025 17:34


          starting to like going down this path, it means the counter updates c…

a683706

…an stay generic and stateful handlers are explicit


          add counters and verify they work as expected while evaluating detectors

e7859fd


          ensure counter thresholds are working to escalate

9fb3724


          clean-up the tests and reset the counters if the detector is resolved

c1f628d


          make sure the state is only updated if it's a new state change and th…

2ab9905

…e conditions were met. make that clearer too


          Update the test base to have better typing and re-add that helper i r…

d82c365

…emoved


          update tests to use the new detector state

3e0e910


          Move the docs into pretty markdown

65f1740


          a bit more docs

a4aebad

saponifi3d added 7 commits

May 19, 2025 17:34


          simplify counters on stateful detector to just be the thresholds for …

3d0bfe8

…now. can make more generic when cases arise


          fix the linter comment, also change the order of the test to short ci…

73c6093

…rcuit the if statement putting the faster check first


          change how the state gets incremented, include any priorities lower t…

828bd4b

…han the triggered priority as well. handle OK separately


          Update the state checks to find if there is any potential breach. Nee…

a64450e

…d to think about if this should be filtered to only configured thresholds or not.


          leave a comment for future me

df0e9cd


          fix the processing and some of the tests to be dynamic to the configu…

…rations.


          unify naming schemes a little more -- starting to see the next pr

saponifi3d force-pushed the jcallender/aci/ref-stateful-priorities branch from 3a9cf30 to 2611032 Compare

May 20, 2025 00:35

vercel bot deployed to Preview

May 20, 2025 00:40

View deployment

saponifi3d merged commit 27067a1 into master

60 checks passed

saponifi3d deleted the jcallender/aci/ref-stateful-priorities branch

May 21, 2025 17:17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels