-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Stage 0: Introduce Entity Field Set into ECS #2434
Conversation
@MikePaquette @YulNaumenko, I've drafted the RFC to introduce the entity field set into ECS like we talked about. Before taking it out of draft, I wanted to check with you both to see if there's anything you think should be included or addressed as part of this stage. Lmk 🙏🏾 |
… fields (like event.url)
💚 CLA has been signed |
Reviewed this with @MikePaquette and are good to go for broader reviews 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, my comments could also be addessed in future RFC stages. I'll wait for subject matter experts to review and some of the other comments to be addressed before giving it the official approval though
|
||
| Field | Type | Description | | ||
|-------|------|-------------| | ||
| entity.id | keyword | A unique identifier for the entity. When multiple identifiers exist, this should be the most stable and commonly used identifier that: 1) persists across the entity's lifecycle, 2) ensures uniqueness within its scope, 3) is commonly used for queries and correlation, and 4) is readily available in most observations (logs/events). For entities with dedicated field sets (e.g., host, user), this value should match the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs) can be preserved in entity.raw. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be more concretely defined. It seems like "the most stable and commonly used identifier"
could be subjective, and different implementors could different values for the same entity. I think it might be better if this listed a preferred order of IDs to use, and state the highest priority type that's known must be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @mjwolf. While I don't think it will be feasible to come up with a comprehensive priority list of preferred IDs for all possible entity data sources/types, I agree we should be more concrete.
I'll update the documentation to include specific examples for common entity types. For instance:
- For AWS resources: prefer ARN when available
- For GCP resources: prefer full resource name
- For Azure resources: prefer Azure Resource ID
- For Kubernetes resources: prefer namespace/name combination
- For hosts: prefer FQDN, then hostname, then instance ID
- For users: etc.,
This should help guide implementors while still allowing flexibility for entity types we haven't explicitly covered. What do you think? cc @romulets @maxcold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! I think having one preferred id in the documentation itself is better than what it's right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this breakdown, since it answers a lot of questions that keep popping up. Especially stuff like GCP, where there isn't a one defined ARN-like format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good! I will incorporate this "guidance" into the subsequent stages of the RFC process 👍🏾
| Field | Type | Description | | ||
|-------|------|-------------| | ||
| entity.id | keyword | A unique identifier for the entity. When multiple identifiers exist, this should be the most stable and commonly used identifier that: 1) persists across the entity's lifecycle, 2) ensures uniqueness within its scope, 3) is commonly used for queries and correlation, and 4) is readily available in most observations (logs/events). For entities with dedicated field sets (e.g., host, user), this value should match the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs) can be preserved in entity.raw. | | ||
| entity.source | keyword | The module or integration that provided this entity data (similar to event.module). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be just replaced by event.module
? Could you define exactly how they are different
2fbfa28
to
1fc6cff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment on entity.id
. Looks good to me!
Overview
An entity represents a discrete, identifiable component within an IT environment that can be described by a set of attributes and maintains its identity over time. Entities can be physical (like hosts or devices), logical (like containers or processes), or abstract (like applications or services).
Currently, ECS provides specific field sets for certain categories of entities (e.g., host, user, cloud, orchestrator) to capture their metadata. However, as IT infrastructure continues to evolve, we encounter an increasing number of entity types that don't cleanly fit into existing field sets – for example, storage services like S3, database instances like DynamoDB, or various other cloud services and IT-related infrastructure components (both digital and physical).
This RFC proposes a new entity fieldset that aims to solve this and several other challenges. Currently at Stage 0 (strawperson), seeking initial feedback on the approach and concept. See
/rfcs/text/0049-entity-fields.md
for more details.PR Guidelines
make test
? N/Amake
and committed those changes? N/A