Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(slo): cleanup temp summary documents task #210264

Merged
merged 17 commits into from
Feb 20, 2025

Conversation

kdelemme
Copy link
Contributor

@kdelemme kdelemme commented Feb 7, 2025

Resolves #210194

🏇🏻 Summary

These temporary documents are primarily used to return a placeholder SLO Instance after it has been created in the UI, until the two layers of transform generate one or more actual SLO Instances for that SLO Definition. However, we deduplicate these documents only when both a temporary and a non-temporary summary document are returned in the same search query. This situation can never occur if there are more SLO Instances than the page size, resulting in temporary SLO Instances never being deleted.

This PR introduces a new task that runs every hour to delete temporary summary documents that are no longer needed, as real summary documents have been generated by the transforms. As a circuit breaker, we count the number of temp summary documents and return early if nothing is found.

@kdelemme kdelemme added release_note:skip Skip the PR/issue when compiling release notes v9.0.0 Team:obs-ux-management Observability Management User Experience Team backport:skip This commit does not require backporting labels Feb 10, 2025
@kdelemme kdelemme force-pushed the feat/cleanup-temp-summary-task branch from 5fa8d8e to 9d1364a Compare February 10, 2025 16:50
@kdelemme kdelemme marked this pull request as ready for review February 10, 2025 16:50
@kdelemme kdelemme requested a review from a team as a code owner February 10, 2025 16:50
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@kdelemme kdelemme added v9.1.0 and removed v9.0.0 labels Feb 10, 2025
@kdelemme kdelemme force-pushed the feat/cleanup-temp-summary-task branch from 526979b to 6f8394a Compare February 11, 2025 01:41
@kdelemme kdelemme requested a review from a team as a code owner February 11, 2025 01:44
@@ -180,6 +189,8 @@ export class SLOPlugin
?.start(plugins.taskManager, internalSoClient, internalEsClient)
.catch(() => {});

this.tempSummaryCleanupTask?.start({ taskManager: plugins.taskManager }).catch(() => {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally this should only need to be started if user adds SLOs for the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or in the case they have some existing SLOs and are only migrating from a previous version? Handling every cases might become tricky.
There is a feature flag that can be used to turn off this task, if someone's complaining about this task and don't use SLOs, it can be disabled easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can it be disabled in cloud? cloud has an allow list for which kibana configs can be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question is how can we update that allow list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure myself, on cloud first region you can add all configs but not on other regions. but i think it's highly unlikely it will be needed in this case.

@kdelemme kdelemme force-pushed the feat/cleanup-temp-summary-task branch from af95a53 to 46150c8 Compare February 11, 2025 15:17
@kdelemme kdelemme force-pushed the feat/cleanup-temp-summary-task branch from 335baa9 to 752ce38 Compare February 12, 2025 19:18
@kdelemme kdelemme requested a review from a team February 17, 2025 21:21
Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with suggestions !!

IMO all those log.info should be log.debug, it's an unnecessary info for typical kibana user.

@kdelemme
Copy link
Contributor Author

Thanks @shahzad31, I'll update the logger level used and the terminate after 👍🏻

@kdelemme kdelemme force-pushed the feat/cleanup-temp-summary-task branch from 86c3927 to 9f06a98 Compare February 18, 2025 16:41
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observability 1405 1406 +1
slo 1006 1007 +1
synthetics 970 971 +1
total +3

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/slo-schema 183 187 +4

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observability 1.3MB 1.3MB +248.0B
slo 795.3KB 795.6KB +246.0B
synthetics 788.4KB 788.7KB +248.0B
total +742.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
slo 42.2KB 42.2KB +16.0B
Unknown metric groups

API count

id before after diff
@kbn/slo-schema 183 187 +4

History

@kdelemme kdelemme requested a review from ymao1 February 20, 2025 13:49
@kdelemme kdelemme enabled auto-merge (squash) February 20, 2025 13:49
@kdelemme kdelemme requested review from ymao1 and removed request for ymao1 February 20, 2025 13:57
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response Ops changes LGTM. New task type with 1h cadence and 2m timeout.

@kdelemme kdelemme merged commit 875a42c into elastic:main Feb 20, 2025
9 checks passed
@kdelemme kdelemme deleted the feat/cleanup-temp-summary-task branch February 20, 2025 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-management Observability Management User Experience Team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SLO] [Management] deduping task for temporary summary documents
4 participants