-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Allow collection CPU profiles when collecting agent diagnostics #162849
Comments
Pinging @elastic/fleet (Team:Fleet) |
@zombieFox could you please help us out here with the design? |
@nimarezainia @zombieFox I don't see any update here, we are we at? Are we just going down the path of adding a simple checkbox in the UI below? |
We should use the ability to generate UI from config to do this. I'll add it to the list. (fyi @amitkanfer @kpollich ) |
where's the list ? :) |
Discussed today with @zombieFox and @nimarezainia, we all agreed to start with a checkbox to enable CPU profile collection. Hence moving this to a ready state. |
@kpollich @juliaElastic for the request diagnostic, can we send the |
I think we have to extend the |
So this should be available only if fleet server is in the right version though. |
Checked this further and I think it also needs changes on elastic-agent side, to add a cpuProfile flag to the request diagnostics action and apply the flag when collecting diagnostics in the handle function. |
Correct the change needs to be made across Fleet, Fleet Server, and Elastic Agent. Prior to making the Elastic Agent change the new parameter in the diagnostics action needs to be defined. You can't test any of this without implementing the whole system, the easiest option is to have whoever makes the Fleet Server (and possibly Fleet) change also change the agent. This is just passing in a function argument that already exists in the right place. We could also have multiple people do this, that's just more coordination overhead. I wrote up how to do this in detail in elastic/elastic-agent#3491. If the Fleet changes come first and are merged into a snapshot image we can write an E2E test for it in the agent repository. This could be done after manual verification end to end though. |
@cmacknz changed the area for: elastic/elastic-agent#3491 As I believe one of the fleet engineer can do both Elastic Agent and Fleet Server change at the same time. |
moving this to sprint 27 as the item blocking it is in sprint 26 |
Changes to fleet-server are here: elastic/fleet-server#3333 |
We should add a confirmation modal to the request diagnostics flow that allows the user to check a checkbox for |
## Summary Closes #162849 Added option to request agent diagnostics with CPU metrics. This is a new checkbox in Request diagnostics modal. When clicked, the action will contain a `additional_metrics: ["CPU"]`. Also replaced the click handler of `Request diagnotics .zip` button in Agent Details / Diagnostics tab, to open the same modal instead of immediately triggering the action. To verify: - Enroll 2 agents with version `8.14-SNAPSHOT` - Click on Agent diagnostics action in Agent list / Agent details - Verify that the CPU checkbox appears in the modal window - Verify that the diagnotics zip contains `cpu.pprof` file - Verify the same in bulk request diagnostics action when selecting 2 agents <img width="1412" alt="image" src="https://github.com/elastic/kibana/assets/90178898/4b8ee884-42e7-4da1-91cb-903fa05a2abf"> Action created in `.fleet-actions`: ``` { "_index": ".fleet-actions-7", "_id": "224e484f-fc49-4fba-989e-8ac30db29eff", "_score": 1, "_source": { "@timestamp": "2024-04-02T12:24:49.062Z", "expiration": "2024-04-02T12:27:49.062Z", "agents": [ "b4a5d936-265c-40b1-be6a-d13ca91bed36" ], "action_id": "1fc01f99-c511-49bf-9350-96ddf537b562", "data": { "additional_metrics": [ "CPU" ] }, "type": "REQUEST_DIAGNOSTICS", "traceparent": "00-a5e3c645397964952efbeceed896e9db-03730cd3f6d60955-01" } }, ``` ### Checklist Delete any items that are not applicable to this PR. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
elastic/elastic-agent#2140 restored the ability of the
elastic-agent diagnostics
command to collect CPU profiles, which is critically important when diagnosing CPU performance problems.CPU profile collection is optional and is done when the
--cpu-profiles
option is provided likeelastic-agent diagnostics --cpu-profile
.Add a check box or toggle allowing users to opt into collecting CPU profiles when collecting diagnostics through Fleet. The diagnostics action will need to be modified to include this new boolean switch.
Also note that each CPU profile takes approximately 30s to capture, so it is likely that the timeout for the diagnostics action will need to be increased when this option is selected.
Depends on elastic/elastic-agent#3491
The text was updated successfully, but these errors were encountered: