Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for adding data generation for system package #6168

Open
maryam-saeidi opened this issue May 11, 2023 · 14 comments
Open

Request for adding data generation for system package #6168

maryam-saeidi opened this issue May 11, 2023 · 14 comments
Labels
Integration:system System Team:Service-Integrations Label for the Service Integrations team

Comments

@maryam-saeidi
Copy link
Member

maryam-saeidi commented May 11, 2023

Summary

As an Actionable Observability team member, I am looking for a way to generate system data (according to what metricbeat/elastic agent does) to test infra-alert rules. At the moment, I am using high_cardinality_indexer to generate that according to fake_host template.

This approach has the following challenges:

  1. In case of any change in metricbeat/elastic agent, this fake data will get out of sync
  2. At the moment, we need to know all the existing fields to generate fake data with enough information. Having a system package that generates data out of the box helps eliminate the need for the test writer to be aware of all the existing fields.

My end goal is to use this tool in Kibana for API integration testing of infra-alert rules.

Related topic

@ruflin
Copy link
Collaborator

ruflin commented May 12, 2023

@andresrc @aspacca @bturquet @tommyers-elastic As soon as we have the implementation of the corpus spec done in elastic-package, we should move over the existing specs and then tackle the system integration, at least a subset of the datasets as I expect this to be one of the most request datasets for testing.

@maryam-saeidi Can you share a bit more details on what the exact metrics are you are interested in. From the links you shared, it seems to be cpu and network? Others?

@ruflin ruflin added the Team:Service-Integrations Label for the Service Integrations team label May 12, 2023
@maryam-saeidi
Copy link
Member Author

@ruflin At the moment, I would like to test adding a condition for CPU usage and getting alerts related to the hosts that met that threshold.
Suppose I have three hosts and only one of them has CPU usage above 90 (host-1: 50, host-2: 20, host-3: 95). Then I expect the alert document to have host.name: host-3 and related host information according to https://www.elastic.co/guide/en/ecs/current/ecs-host.html

Now my question is: How do ECS host fields relate to system integration? Can I expect all those fields to be added?

Also, regarding what system fields we can alert based on, I checked release-oblt (Create rule > Metric threshold > condition field), and I see we have a lot of system fields:

image

But for my test, CPU, Memory, and Network fields are enough to start (plus the ECS host fields if it is applicable)

@ruflin
Copy link
Collaborator

ruflin commented May 12, 2023

Now my question is: How do ECS host fields relate to system integration? Can I expect all those fields to be added?

Yes

@aspacca
Copy link
Contributor

aspacca commented May 15, 2023

Now my question is: How do ECS host fields relate to system integration? Can I expect all those fields to be added?

this is true for schema-c:

{ "@timestamp": "2023-05-15T17:35:06.228332+09:00","agent.id": "rapidfriend","cloud.account.id": "azurecowl","cloud.availability_zone": "sage-raver-pearweasel","cloud.image.id": "blueeater","cloud.instance.id": "taker-sulpherhead","cloud.instance.name": "dirtridge","cloud.machine.type": "eatergossamerknife","cloud.project.id": "battleforger","cloud.provider": "hazelfairy","cloud.region": "mustang-flier-oilwhip","container.id": "quartzfalcon","container.image.name": "liefalcon","container.labels.belly": "jellyleg","container.labels.hand": "nebulacougar","container.labels.hyena": "cypressminnow","container.name": "grovesnout","data_stream.dataset": "honeysucklestallion","data_stream.namespace": "muckdeer","data_stream.type": "planetdevourer","event.dataset": "system.cpu","event.module": "system","host.architecture": "stealer_translucenthyena","host.containerized": true,"host.cpu.pct": 4.155775,"host.domain": "crackox","host.hostname": "sunsettrader","host.id": "meadowcarpet","host.ip": "182.39.195.123","host.mac": "streamocelot","host.name": "stripedive","host.os.build": "nimblesparrow","host.os.codename": "ceruleanbug","host.os.family": "timefrill","host.os.full": "coconutcharger","host.os.kernel": "scowl-salmon-belly-chiller-rootgrasp","host.os.name": "runner scourge leathergem","host.os.platform": "motleyjay","host.os.version": "scorpionstalkerbigmark","host.type": "feathercrafter","system.cpu.cores": 4,"system.cpu.idle.norm.pct": 4.602111,"system.cpu.idle.pct": 2.933579,"system.cpu.idle.ticks": 3,"system.cpu.iowait.norm.pct": 0.198540,"system.cpu.iowait.pct": 6.837662,"system.cpu.iowait.ticks": 6,"system.cpu.irq.norm.pct": 7.056231,"system.cpu.irq.pct": 6.143894,"system.cpu.irq.ticks": 4,"system.cpu.nice.norm.pct": 5.907551,"system.cpu.nice.pct": 0.178689,"system.cpu.nice.ticks": 3,"system.cpu.softirq.norm.pct": 3.495727,"system.cpu.softirq.pct": 8.562177,"system.cpu.softirq.ticks": 6,"system.cpu.steal.norm.pct": 1.507343,"system.cpu.steal.pct": 8.160910,"system.cpu.steal.ticks": 4,"system.cpu.system.norm.pct": 5.321610,"system.cpu.system.pct": 2.223324,"system.cpu.system.ticks": 6,"system.cpu.total.norm.pct": 4.868852,"system.cpu.total.pct": 8.012242,"system.cpu.user.norm.pct": 4.213471,"system.cpu.user.pct": 3.027456,"system.cpu.user.ticks": 1 }

I indeed have to investigate if the tool supports ECS fields coming from https://github.com/elastic/integrations/blob/main/packages/system/data_stream/cpu/fields/ecs.yml, or they are in the output because they are defined as well in https://github.com/elastic/integrations/blob/main/packages/system/data_stream/cpu/fields/agent.yml

if you want to generate schema-c data (ie: post-ingest pipeline, it does mean you should disable the ingest pipeline when ingesting in metrics-system.cpu-default), you don't need anything else than launching the tool with the following argument: generate system cpu 1.28.0 -t 200KB (please change according to the size you need).

please, beware, as discussed, that unless you are able to tweak the data to be generated trough the fields generation configuration so that they will trigger the rule you want to test, that you cannot be sure that the data generated will contain events that will trigger that rule.

for that https://github.com/elastic/geneve is a better tool, but as more limit regarding the generation of all the fields of the document. I think there is some way to generate the fields affecting the rule as well the ECS one through geneve, @cavokz might be more helpful here

@cavokz
Copy link

cavokz commented May 15, 2023

Thanks @aspacca. Ccing @charlie-pichette.

@maryam-saeidi, Geneve is not very good for generating realistic data, neither in the fields of the generated documents nor in the content of such fields. What Geneve is good for is adding fields mentioned in a query and put there content that would satisfy said query and therefore a rule.

If for example you have this query (not sure if I got the units right here):

any where host.cpu.usage >= 0.90 and _cardinality(host.name, 3)

You would get something similar to

{'host': {'cpu': {'usage': 0.9496416550389374}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.9043770541301733}, 'name': 'sgV'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.9089471310908367}, 'name': 'SzF'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.918347364858316}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.913752499159961}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9687020191511078}, 'name': 'SzF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.952194248562828}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9972572771906527}, 'name': 'SzF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9790489383951492}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9587031853062025}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
...

You see that aside for @timestamp no other fields are generated. host.cpu.name is random garbage but with _cardinality(host.name, 3) you just get three of these. host.cpu.usage will contain random numbers between 0.9 and 1.0 (inclusive).

If this is something that interests you, we need to find the way to integrate Geneve with tools that generate better "background" data on top of which Geneve can adjust/add the fields as needed.

@ruflin
Copy link
Collaborator

ruflin commented May 15, 2023

Elastic has quite a few data generation tool out there. As in many observability cases, the data we are interested in comes from packages, I rather focus for system metrics on the data generated by elastic-package and extending it for the use cases then extending geneve.

@cavokz
Copy link

cavokz commented May 15, 2023

Indeed I was thinking at integrating Geneve with other tools more than extending it.

For instance we already evaluated the idea of adding support for package-integrations in Geneve (elastic/geneve#113) and concluded that it's not a good idea.

@charlie-pichette
Copy link

@maryam-saeidi https://github.com/elastic/logen may also be of value.

@maryam-saeidi
Copy link
Member Author

@charlie-pichette I get 404 when I try to access the repo

@charlie-pichette
Copy link

Perhaps @tammytorbert can provide access to Logen.

@aspacca
Copy link
Contributor

aspacca commented May 16, 2023

@ruflin

I rather focus for system metrics on the data generated by elastic-package and extending it for the use cases then extending geneve.

we will for sure create the assets for the system metrics in elastic-package, still for the use case of @maryam-saeidi it might not be the right solution because of the inability about creating data triggering a rule

@cavokz

we need to find the way to integrate Geneve with tools that generate better "background" data on top of which Geneve can adjust/add the fields as needed.

as @ruflin mentioned, in the context of observability "the data we are interested in comes from packages", and that's what the corpus generator tool handles very well

but it misses the way to drive data according to a query/rule across multiple events

we talked while ago about having the two tools somehow be able to "speak each others" and I see @maryam-saeidi's scenario a good one where we could start building upon: what do you think?

maryam-saeidi added a commit to elastic/kibana that referenced this issue May 16, 2023
## Summary

Closes #157189

This PR adds a metric threshold integration test. This is the first step
in adding more test coverage for observability rules.

**Steps during the test**
1. Generating fake host data by using a similar implementation as
https://github.com/elastic/high-cardinality-cluster
    - Data is generated for the last 15 mins
- Implementation was simplified only to cover fake hosts and was
converted to typescript
2. Creating an action using an index connector
3. Creating a metric threshold rule containing step number 2 action
4. Checking the status of the rule to be active
5. Checking the triggered action to have the correct parameters
6. Checking the generated alert to have the correct information
7. Clean up

**How to run locally**
- Run server
```
node scripts/functional_tests_server --config x-pack/test/api_integration/apis/metrics_ui/config.ts
```
- Then run the test
```
node scripts/functional_tests__runner --include-pack/test/api_integration/apis/metrics_ui/cometric_threshold_rule.ts --config x-pack/test/api_integration/apis/metrics_ui/config.ts
```

**Reference**
I created elastic/integrations#6168 to find a
better way to generate data and make sure that data matches what
metricbeats generates

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
jasonrhodes pushed a commit to elastic/kibana that referenced this issue May 17, 2023
## Summary

Closes #157189

This PR adds a metric threshold integration test. This is the first step
in adding more test coverage for observability rules.

**Steps during the test**
1. Generating fake host data by using a similar implementation as
https://github.com/elastic/high-cardinality-cluster
    - Data is generated for the last 15 mins
- Implementation was simplified only to cover fake hosts and was
converted to typescript
2. Creating an action using an index connector
3. Creating a metric threshold rule containing step number 2 action
4. Checking the status of the rule to be active
5. Checking the triggered action to have the correct parameters
6. Checking the generated alert to have the correct information
7. Clean up

**How to run locally**
- Run server
```
node scripts/functional_tests_server --config x-pack/test/api_integration/apis/metrics_ui/config.ts
```
- Then run the test
```
node scripts/functional_tests__runner --include-pack/test/api_integration/apis/metrics_ui/cometric_threshold_rule.ts --config x-pack/test/api_integration/apis/metrics_ui/config.ts
```

**Reference**
I created elastic/integrations#6168 to find a
better way to generate data and make sure that data matches what
metricbeats generates

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@botelastic
Copy link

botelastic bot commented May 15, 2024

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label May 15, 2024
@ruflin
Copy link
Collaborator

ruflin commented May 15, 2024

@lalit-satapathy ^ Would be great to get this in as it would help with development and testing.

@botelastic botelastic bot removed the Stalled label May 15, 2024
@lalit-satapathy
Copy link
Collaborator

@lalit-satapathy ^ Would be great to get this in as it would help with development and testing.

Yes, will help on this.

But for my test, CPU, Memory, and Network fields are enough to start (plus the ECS host fields if it is applicable)

@maryam-saeidi, We already have the rally benchmark supported for system.cpu and system.memory. Is this something you can give a try and we can extend to system.network in future? If you need help running corpus generator tool, please let's know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:system System Team:Service-Integrations Label for the Service Integrations team
Projects
None yet
Development

No branches or pull requests

7 participants