Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU profile collection to diagnostics handler #4394

Merged
merged 5 commits into from
Mar 13, 2024

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Mar 11, 2024

What does this PR do?

Add the optional CPU profile collection to the diagnostic action handler.
CPU profiles will be collected if the REQUEST_DIAGNOSTICS action has the
optional additional_metrics parameter list contains "CPU".

Why is it important?

Will make diagnostics requested through fleet feature-complete (compared to cli options)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Testing

Changes have been tested with an e2e workflow, see elastic/fleet-server#3333 (comment)

Related issues

@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Mar 11, 2024
@michel-laterman michel-laterman requested a review from a team as a code owner March 11, 2024 10:45
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

mergify bot commented Mar 11, 2024

This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@michel-laterman
Copy link
Contributor Author

fleet-server changes are here: elastic/fleet-server#3333
we will need an option to add the parameter when we create a diagnostic action in the UI.

in the future we can also specify a duration parameter as part of the action, but that will need support from the elastic-agent-client-libs repo in order for us to actually pass it (especially to PerformComponentDiagnostics)

}
diags = append(diags, client.DiagnosticFileResult{
Name: "cpuprofile",
Filename: "cpu.pprof",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied metadata from the control server: https://github.com/elastic/elastic-agent/blob/main/pkg/control/v2/server/server.go#L214-L221
We use cpu.pprof, but all other traces use *.pprof.gz.

Should we alter diagnostics.CreateCPUProfile in order to compress the resulting trace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably correct to change the name to whatever comes back from a GET to http://localhost:6060/debug/pprof/profile?seconds=30

The parameters here could also be pulled out into a shared set of constants much like the profile duration.

}
diags = append(diags, client.DiagnosticFileResult{
Name: "cpuprofile",
Filename: "cpu.pprof",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably correct to change the name to whatever comes back from a GET to http://localhost:6060/debug/pprof/profile?seconds=30

The parameters here could also be pulled out into a shared set of constants much like the profile duration.

Copy link

Quality Gate passed Quality Gate passed

The SonarQube Quality Gate passed, but some issues were introduced.

1 New issue
0 Security Hotspots
81.1% 81.1% Coverage on New Code
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

@michel-laterman michel-laterman merged commit 7541561 into elastic:main Mar 13, 2024
9 checks passed
@michel-laterman michel-laterman deleted the diag-handler-cpu branch March 13, 2024 08:13
@cmacknz
Copy link
Member

cmacknz commented Mar 13, 2024

If there is a Fleet API to let us request this we could write an E2E test for diagnostics requests in this repository. @michel-laterman is there an API for this? Is there a change on the Kibana side to follow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow collecting CPU diagnostics in the Fleet diagnostics action handler
4 participants