Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Stage 0: Introduce Entity Field Set into ECS #2434

Merged
merged 9 commits into from
Mar 6, 2025
Merged

Conversation

tinnytintin10
Copy link
Contributor

Overview

An entity represents a discrete, identifiable component within an IT environment that can be described by a set of attributes and maintains its identity over time. Entities can be physical (like hosts or devices), logical (like containers or processes), or abstract (like applications or services).

Currently, ECS provides specific field sets for certain categories of entities (e.g., host, user, cloud, orchestrator) to capture their metadata. However, as IT infrastructure continues to evolve, we encounter an increasing number of entity types that don't cleanly fit into existing field sets – for example, storage services like S3, database instances like DynamoDB, or various other cloud services and IT-related infrastructure components (both digital and physical).

This RFC proposes a new entity fieldset that aims to solve this and several other challenges. Currently at Stage 0 (strawperson), seeking initial feedback on the approach and concept. See /rfcs/text/0049-entity-fields.md for more details.

PR Guidelines

  • Have you signed the contributor license agreement? ✅
  • Have you followed the contributor guidelines? ✅
  • For proposing substantial changes or additions to the schema, have you reviewed the [RFC process] (https://github.com/elastic/ecs/blob/main/rfcs/README.md)? ✅
  • If submitting code/script changes, have you verified all tests pass locally using make test? N/A
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes? N/A
  • Is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed. ✅
  • Have you added an entry to the CHANGELOG.next.md? N/A

@tinnytintin10
Copy link
Contributor Author

@MikePaquette @YulNaumenko, I've drafted the RFC to introduce the entity field set into ECS like we talked about. Before taking it out of draft, I wanted to check with you both to see if there's anything you think should be included or addressed as part of this stage. Lmk 🙏🏾

Copy link

cla-checker-service bot commented Feb 24, 2025

💚 CLA has been signed

@tinnytintin10 tinnytintin10 marked this pull request as ready for review February 24, 2025 03:34
@tinnytintin10 tinnytintin10 requested a review from a team as a code owner February 24, 2025 03:34
@tinnytintin10
Copy link
Contributor Author

tinnytintin10 commented Feb 24, 2025

Reviewed this with @MikePaquette and are good to go for broader reviews 🚀

cc @tehilashn @oren-zohar @YulNaumenko

Copy link
Contributor

@mjwolf mjwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, my comments could also be addessed in future RFC stages. I'll wait for subject matter experts to review and some of the other comments to be addressed before giving it the official approval though


| Field | Type | Description |
|-------|------|-------------|
| entity.id | keyword | A unique identifier for the entity. When multiple identifiers exist, this should be the most stable and commonly used identifier that: 1) persists across the entity's lifecycle, 2) ensures uniqueness within its scope, 3) is commonly used for queries and correlation, and 4) is readily available in most observations (logs/events). For entities with dedicated field sets (e.g., host, user), this value should match the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs) can be preserved in entity.raw. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be more concretely defined. It seems like "the most stable and commonly used identifier" could be subjective, and different implementors could different values for the same entity. I think it might be better if this listed a preferred order of IDs to use, and state the highest priority type that's known must be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @mjwolf. While I don't think it will be feasible to come up with a comprehensive priority list of preferred IDs for all possible entity data sources/types, I agree we should be more concrete.

I'll update the documentation to include specific examples for common entity types. For instance:

  • For AWS resources: prefer ARN when available
  • For GCP resources: prefer full resource name
  • For Azure resources: prefer Azure Resource ID
  • For Kubernetes resources: prefer namespace/name combination
  • For hosts: prefer FQDN, then hostname, then instance ID
  • For users: etc.,

This should help guide implementors while still allowing flexibility for entity types we haven't explicitly covered. What do you think? cc @romulets @maxcold

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! I think having one preferred id in the documentation itself is better than what it's right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this breakdown, since it answers a lot of questions that keep popping up. Especially stuff like GCP, where there isn't a one defined ARN-like format.

Copy link
Contributor Author

@tinnytintin10 tinnytintin10 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good! I will incorporate this "guidance" into the subsequent stages of the RFC process 👍🏾

| Field | Type | Description |
|-------|------|-------------|
| entity.id | keyword | A unique identifier for the entity. When multiple identifiers exist, this should be the most stable and commonly used identifier that: 1) persists across the entity's lifecycle, 2) ensures uniqueness within its scope, 3) is commonly used for queries and correlation, and 4) is readily available in most observations (logs/events). For entities with dedicated field sets (e.g., host, user), this value should match the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs) can be preserved in entity.raw. |
| entity.source | keyword | The module or integration that provided this entity data (similar to event.module). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be just replaced by event.module? Could you define exactly how they are different

Copy link

@JordanSh JordanSh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@maxcold maxcold self-requested a review March 6, 2025 12:28
Copy link
Member

@kubasobon kubasobon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment on entity.id. Looks good to me!

@mjwolf mjwolf merged commit 486442b into main Mar 6, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants