Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Improve package policy upgrade performance #205332

Open
nchaulet opened this issue Dec 31, 2024 · 6 comments
Open

[Fleet] Improve package policy upgrade performance #205332

nchaulet opened this issue Dec 31, 2024 · 6 comments
Assignees
Labels
rca-action Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@nchaulet
Copy link
Member

nchaulet commented Dec 31, 2024

Description

Package policy upgrade after a package upgrade could be really problematic when upgrading a lot of package policies or when upgrading a large package (aws is a good example to use for testing)

Note: it seems there is also an issue in the UI as we call the dry run upgrade API in the settinugs page to show the checkbox to upgrade integration policies

Possible optimizations

Work in progress

  • Group bumping agent policy revision, we are bumping revision for agent policy for every package policy, if multiple package policies use the same agent policy we could avoid that (expensive) operation
  • Stop fetching all ES assets and only fetch the needed one (for compiling agent input and streams we probably only need handlebars template)
  • Better package info cache
@nchaulet nchaulet added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 31, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member

  • Group bumping agent policy revision, we are bumping revision for agent policy for every package policy, if multiple package policies use the same agent policy we could avoid that (expensive) operation
  • Stop fetching all ES assets and only fetch the needed one (for compiling agent input and streams we probably only need handlebars template)

These two options seem like the most immediately actionable, so I think we should pursue these in the short term.

  • Better package info cache

I think we'd need to spend some time thinking about what this means and what kind of improvements we can make. I suspect the improvements above will be sufficient to optimize the integration upgrade process, but we should measure along the way to be sure.

@cmacknz
Copy link
Member

cmacknz commented Jan 16, 2025

Do we have any kind of profiling or measurement of where the performance problem is coming from?

The suggestions sounds reasonable but it's not clear that we know they'll solve the problem or what the specific problems are yet. I would prefer to start with measurement and instrumentation so we can tell when we've improved things and that we are focusing on the correct problem.

@nchaulet
Copy link
Member Author

I started to add some instrumentation in an environement with 1 agent policy with 10 aws package policy,
upgrading those package policies take 41s

Image

I am trying to add a little more span to know what is taking time, it seems bumping the agent policy revision is clearly a big operation (that we are doing too many times)

Image

Will try to collect more data.

@whyyouwannaknow
Copy link

Hello,

Do we have any news regarding this issue?
We are struggling everytime we need to upgrade our 101 AWS integrations configured...

This is a nightmare to do everytime as we wish to keep those up to date to get the latest implementation and fix.

Thank you,

@kpollich
Copy link
Member

kpollich commented Mar 4, 2025

Hey @whyyouwannaknow we are actively working on some improvements in this area both on memory consumption and timing. See #211961 for some recent work around package installation/upgrade wrt to memory consumptions.

This is currently a high priority for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rca-action Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

6 participants