Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Detection Engine] Verify efficient usage of createPointInTimeFinder #211637

Closed
yctercero opened this issue Feb 18, 2025 · 7 comments
Closed

[Detection Engine] Verify efficient usage of createPointInTimeFinder #211637

yctercero opened this issue Feb 18, 2025 · 7 comments
Assignees
Labels
Team:Detection Engine Security Solution Detection Engine Area Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. technical debt Improvement of the software architecture and operational architecture

Comments

@yctercero
Copy link
Contributor

yctercero commented Feb 18, 2025

In our plugins, the following files use createPointInTimeFinder and createPointInTimeFinderDecryptedAsInternalUser. We should analyze whether we are loading all results into memory and apply mitigations if possible, as described in this ticket: #203017.

  • x-pack/plugins/lists/server/services/exception_lists/find_value_list_exception_list_items_point_in_time_finder.ts
  • x-pack/plugins/lists/server/services/exception_lists/find_exception_list_point_in_time_finder.ts
  • x-pack/plugins/lists/server/services/exception_lists/find_exception_list_items_point_in_time_finder.ts
@yctercero yctercero added impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Engine Security Solution Detection Engine Area Team:Detections and Resp Security Detection Response Team technical debt Improvement of the software architecture and operational architecture labels Feb 18, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-engine (Team:Detection Engine)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@rylnd rylnd self-assigned this Feb 19, 2025
@yctercero yctercero removed the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Feb 19, 2025
@rylnd
Copy link
Contributor

rylnd commented Feb 19, 2025

Result of initial audit:

  1. It looks like the delete_list_route does perform an unbounded (size) call to findValueListExceptionListItemsPointInTimeFinder; however it is scoped to a particular list_id, which mitigates the issue somewhat. A perPage of 1000 is applied to the PIT query, but code is still sensitive to a list with a large number of items, as they'd all be loaded into memory.
  2. Again in delete_list_route, there is a similarly unbounded call to findExceptionListPointInTimeFinder. It is used when deleting exception items to verify that the item being deleted does not reference any other exception lists. When a list item is deleted, all exception lists referencing it will be loaded into memory here.
  3. When duplicating exception lists, we call findExceptionListsItemPointInTimeFinder with a maxSize of 10k to retrieve all existing items to be duplicated to the new list. Since we have a set maxSize, I think we can discount this one?

To summarize the above: we do not have any situations where we attempt to retrieve all known lists/items into memory. However, when deleting an exception list, there are two instances where we perform an unbounded retrieval (of items referenced by a value list, and exception lists referenced by those items, respectively) of items linked to the exception list being deleted. Since there is no limit to the number of value lists referenced in an exception list, nor to the number of exception lists referencing a value list, the size is still effectively unbounded.

@rylnd
Copy link
Contributor

rylnd commented Feb 20, 2025

@yctercero given the above, I can think of a few angles from which to address things:

  1. Add caps on the existing unbounded calls, and throw an error if the queries' size would exceed that. This would bound the situation moving forward, at the expense of forcing users to deal with these large (or largely shared) lists manually (since they could no longer delete them via the API).
  2. Continue with the existing behaviors, but somehow warn the user if they have lists exceeding 10k items/references
  3. Continue with the existing behaviors, but add telemetry around the sizes of these calls

@yctercero
Copy link
Contributor Author

@rylnd thanks so much for doing the analysis of our uses here.

Could you create a ticket for us to follow up on this and add to backlog?

A user could easily have more than 10k value list items. I'm not as worried about the number of value lists referenced in an exception list.

Add caps on the existing unbounded calls, and throw an error if the queries' size would exceed that. This would bound the situation moving forward, at the expense of forcing users to deal with these large (or largely shared) lists manually (since they could no longer delete them via the API).

This would be a breaking change and with value lists, they could very easily reach 10k+ so I'm a bit weary of going down this route.

Continue with the existing behaviors, but somehow warn the user if they have lists exceeding 10k items/references

What exactly would we warn them about? Would we need to create some kind of async route to deal with these instances?

Continue with the existing behaviors, but add telemetry around the sizes of these calls

We are prioritizing adding telemetry in 8.19, it may be worth adding a comment in this ticket to ensure we track this.

@rylnd
Copy link
Contributor

rylnd commented Feb 25, 2025

@yctercero I created #212460; let me know if that looks good to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Detection Engine Security Solution Detection Engine Area Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. technical debt Improvement of the software architecture and operational architecture
Projects
None yet
Development

No branches or pull requests

3 participants