Skip to content

Commit

Permalink
Merge branch 'main' into cloud_account_refinement
Browse files Browse the repository at this point in the history
  • Loading branch information
eedugon authored Feb 4, 2025
2 parents 6285381 + 6329ba3 commit 050bca0
Show file tree
Hide file tree
Showing 26 changed files with 115 additions and 360 deletions.
25 changes: 7 additions & 18 deletions explore-analyze/machine-learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,21 @@ mapped_urls:

# What is Elastic Machine Learning? [machine-learning-intro]

{{ml-cap}} features analyze your data and generate models for its patterns of behavior.
The type of analysis that you choose depends on the questions or problems you want to address and the type of data you have available.
{{ml-cap}} features analyze your data and generate models for its patterns of behavior. The type of analysis that you choose depends on the questions or problems you want to address and the type of data you have available.

## Unsupervised {{ml}} [machine-learning-unsupervised]

There are two types of analysis that can deduce the patterns and relationships within your data without training or intervention: *{{anomaly-detect}}* and *{{oldetection}}*.

[{{anomaly-detect-cap}}](machine-learning/anomaly-detection.md) requires time series data.
It constructs a probability model and can run continuously to identify unusual events as they occur. The model evolves over time; you can use its insights to forecast future behavior.
[{{anomaly-detect-cap}}](machine-learning/anomaly-detection.md) requires time series data. It constructs a probability model and can run continuously to identify unusual events as they occur. The model evolves over time; you can use its insights to forecast future behavior.

[{{oldetection-cap}}](machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md) does not require time series data.
It is a type of {{dfanalytics}} that identifies unusual points in a data set by analyzing how close each data point is to others and the density of the cluster of points around it.
It does not run continuously; it generates a copy of your data set where each data point is annotated with an {{olscore}}.
The score indicates the extent to which a data point is an outlier compared to other data points.
[{{oldetection-cap}}](machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md) does not require time series data. It is a type of {{dfanalytics}} that identifies unusual points in a data set by analyzing how close each data point is to others and the density of the cluster of points around it. It does not run continuously; it generates a copy of your data set where each data point is annotated with an {{olscore}}. The score indicates the extent to which a data point is an outlier compared to other data points.

## Supervised {{ml}} [machine-learning-supervised]

There are two types of {{dfanalytics}} that require training data sets: *{{classification}}* and *{{regression}}*.

In both cases, the result is a copy of your data set where each data point is annotated with predictions and a trained model, which you can deploy to make predictions for new data.
For more information, refer to [Introduction to supervised learning](machine-learning/data-frame-analytics/ml-dfa-overview.md#ml-supervised-workflow).
In both cases, the result is a copy of your data set where each data point is annotated with predictions and a trained model, which you can deploy to make predictions for new data. For more information, refer to [Introduction to supervised learning](machine-learning/data-frame-analytics/ml-dfa-overview.md#ml-supervised-workflow).

[{{classification-cap}}](machine-learning/data-frame-analytics/ml-dfa-classification.md) learns relationships between your data points in order to predict discrete categorical values, such as whether a DNS request originates from a malicious or benign domain.

Expand All @@ -44,18 +38,13 @@ The {{ml-features}} that are available vary by project type:

## Synchronize saved objects [machine-learning-synchronize-saved-objects]

Before you can view your {{ml}} {dfeeds}, jobs, and trained models in {{kib}}, they must have saved objects.
For example, if you used APIs to create your jobs, wait for automatic synchronization or go to the **{{ml-app}}** page and click **Synchronize saved objects**.
Before you can view your {{ml}} {dfeeds}, jobs, and trained models in {{kib}}, they must have saved objects. For example, if you used APIs to create your jobs, wait for automatic synchronization or go to the **{{ml-app}}** page and click **Synchronize saved objects**.

## Export and import jobs [machine-learning-export-and-import-jobs]

You can export and import your {{ml}} job and {{dfeed}} configuration details on the **{{ml-app}}** page.
For example, you can export jobs from your test environment and import them in your production environment.
You can export and import your {{ml}} job and {{dfeed}} configuration details on the **{{ml-app}}** page. For example, you can export jobs from your test environment and import them in your production environment.

The exported file contains configuration details; it does not contain the {{ml}} models.
For {{anomaly-detect}}, you must import and run the job to build a model that is accurate for the new environment.
For {{dfanalytics}}, trained models are portable; you can import the job then transfer the model to the new cluster.
Refer to [Exporting and importing {{dfanalytics}} trained models](machine-learning/data-frame-analytics/ml-trained-models.md#export-import).
The exported file contains configuration details; it does not contain the {{ml}} models. For {{anomaly-detect}}, you must import and run the job to build a model that is accurate for the new environment. For {{dfanalytics}}, trained models are portable; you can import the job then transfer the model to the new cluster. Refer to [Exporting and importing {{dfanalytics}} trained models](machine-learning/data-frame-analytics/ml-trained-models.md#export-import).

There are some additional actions that you must take before you can successfully import and run your jobs:

Expand Down
16 changes: 8 additions & 8 deletions explore-analyze/machine-learning/anomaly-detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ mapped_urls:
- https://www.elastic.co/guide/en/kibana/current/xpack-ml-anomalies.html
---

# Anomaly detection
# Anomaly detection [ml-ad-overview]

% What needs to be done: Align serverless/stateful
You can use {{stack}} {{ml-features}} to analyze time series data and identify anomalous patterns in your data set.

% Scope notes: Colleen McGinnis removed "https://www.elastic.co/guide/en/serverless/current/observability-machine-learning.html" and "All children" because this page is also used below in "AIOps Labs" with "All children" selected. We can't copy all children to two places.

% Use migrated content from existing pages that map to this page:

% - [ ] ./raw-migrated-files/stack-docs/machine-learning/ml-ad-overview.md
% - [ ] ./raw-migrated-files/kibana/kibana/xpack-ml-anomalies.md
* [Finding anomalies](../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-finding-anomalies.md)

Check failure on line 11 in explore-analyze/machine-learning/anomaly-detection.md

View workflow job for this annotation

GitHub Actions / preview / build

`../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-finding-anomalies.md` does not exist. resolved to `/github/workspace/explore-analyze/machine-learning/../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-finding-anomalies.md
* [Tutorial: Getting started with {{anomaly-detect}}](../../../explore-analyze/machine-learning/anomaly-detection/ml-getting-started.md)

Check failure on line 12 in explore-analyze/machine-learning/anomaly-detection.md

View workflow job for this annotation

GitHub Actions / preview / build

`../../../explore-analyze/machine-learning/anomaly-detection/ml-getting-started.md` does not exist. resolved to `/github/workspace/explore-analyze/machine-learning/../../../explore-analyze/machine-learning/anomaly-detection/ml-getting-started.md
* [*Advanced concepts*](../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-concepts.md)

Check failure on line 13 in explore-analyze/machine-learning/anomaly-detection.md

View workflow job for this annotation

GitHub Actions / preview / build

`../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-concepts.md` does not exist. resolved to `/github/workspace/explore-analyze/machine-learning/../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-concepts.md
* [*API quick reference*](../../../explore-analyze/machine-learning/anomaly-detection/ml-api-quickref.md)

Check failure on line 14 in explore-analyze/machine-learning/anomaly-detection.md

View workflow job for this annotation

GitHub Actions / preview / build

`../../../explore-analyze/machine-learning/anomaly-detection/ml-api-quickref.md` does not exist. resolved to `/github/workspace/explore-analyze/machine-learning/../../../explore-analyze/machine-learning/anomaly-detection/ml-api-quickref.md
* [How-tos](../../../explore-analyze/machine-learning/anomaly-detection/anomaly-how-tos.md)

Check failure on line 15 in explore-analyze/machine-learning/anomaly-detection.md

View workflow job for this annotation

GitHub Actions / preview / build

`../../../explore-analyze/machine-learning/anomaly-detection/anomaly-how-tos.md` does not exist. resolved to `/github/workspace/explore-analyze/machine-learning/../../../explore-analyze/machine-learning/anomaly-detection/anomaly-how-tos.md
* [*Resources*](../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-resources.md)

Check failure on line 16 in explore-analyze/machine-learning/anomaly-detection.md

View workflow job for this annotation

GitHub Actions / preview / build

`../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-resources.md` does not exist. resolved to `/github/workspace/explore-analyze/machine-learning/../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@ Prerequisites:

The following recommendations are not sequential – the numbers just help to navigate between the list items; you can take action on one or more of them in any order. You can implement some of these changes on existing jobs; others require you to clone an existing job or create a new one.


## 1. Consider autoscaling, node sizing, and configuration [node-sizing]
## 1. Consider autoscaling, node sizing, and configuration [node-sizing]

An {{anomaly-job}} runs on a single node and requires sufficient resources to hold its model in memory. When a job is opened, it will be placed on the node with the most available memory at that time.

Expand All @@ -32,20 +31,17 @@ Increasing the number of nodes will allow distribution of job processing as well

In {{ecloud}}, you can enable [autoscaling](../../../deploy-manage/autoscaling.md) so that the {{ml}} nodes in your cluster scale up or down based on current {{ml}} memory and CPU requirements. The {{ecloud}} infrastructure allows you to create {{ml-jobs}} up to the size that fits on the maximum node size that the cluster can scale to (usually somewhere between 58GB and 64GB) rather than what would fit in the current cluster. If you attempt to use autoscaling outside of {{ecloud}}, then set `xpack.ml.max_ml_node_size` to define the maximum possible size of a {{ml}} node. Creating {{ml-jobs}} with model memory limits larger than the maximum node size can support is not allowed, as autoscaling cannot add a node big enough to run the job. On a self-managed deployment, you can set `xpack.ml.max_model_memory_limit` according to the available resources of the {{ml}} node. This prevents you from creating jobs with model memory limits too high to open in your cluster.


## 2. Use dedicated results indices [dedicated-results-index]
## 2. Use dedicated results indices [dedicated-results-index]

For large jobs, use a dedicated results index. This ensures that results from a single large job do not dominate the shared results index. It also ensures that the job and results (if `results_retention_days` is set) can be deleted more efficiently and improves renormalization performance. By default, {{anomaly-job}} results are stored in a shared index. To change to use a dedicated result index, you need to clone or create a new job.


## 3. Disable model plot [model-plot]
## 3. Disable model plot [model-plot]

By default, model plot is enabled when you create jobs in {{kib}}. If you have a large job, however, consider disabling it. You can disable model plot for existing jobs by using the [Update {{anomaly-jobs}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-update-job.html).

Model plot calculates and stores the model bounds for each analyzed entity, including both anomalous and non-anomalous entities. These bounds are used to display the shaded area in the Single Metric Viewer charts. Model plot creates one result document per bucket per split field value. If you have high cardinality fields and/or a short bucket span, disabling model plot reduces processing workload and results stored.


## 4. Understand how detector configuration can impact model memory [detector-configuration]
## 4. Understand how detector configuration can impact model memory [detector-configuration]

The following factors are most significant in increasing the memory required for a job:

Expand All @@ -59,36 +55,31 @@ If you have high cardinality `by` or `partition` fields, ensure you have suffici

To change partitioning fields, influencers and/or detectors, you need to clone or create a new job.


## 5. Optimize the bucket span [optimize-bucket-span]
## 5. Optimize the bucket span [optimize-bucket-span]

Short bucket spans and high cardinality detectors are resource intensive and require more system resources.

Bucket span is typically between 15m and 1h. The recommended value always depends on the data, the use case, and the latency required for alerting. A job with a longer bucket span uses less resources because fewer buckets require processing and fewer results are written. Bucket spans that are sensible dividers of an hour or day work best as most periodic patterns have a daily cycle.

If your use case is suitable, consider increasing the bucket span to reduce processing workload. To change the bucket span, you need to clone or create a new job.


## 6. Set the `scroll_size` of the {{dfeed}} [set-scroll-size]
## 6. Set the `scroll_size` of the {{dfeed}} [set-scroll-size]

This consideration only applies to {{dfeeds}} that **do not** use aggregations. The `scroll_size` parameter of a {{dfeed}} specifies the number of hits to return from {{es}} searches. The higher the `scroll_size` the more results are returned by a single search. When your {{anomaly-job}} has a high throughput, increasing `scroll_size` may decrease the time the job needs to analyze incoming data, however may also increase the pressure on your cluster. You cannot increase `scroll_size` to more than the value of `index.max_result_window` which is 10,000 by default. If you update the settings of a {{dfeed}}, you must stop and start the {{dfeed}} for the change to be applied.


## 7. Set the model memory limit [set-model-memory-limit]
## 7. Set the model memory limit [set-model-memory-limit]

The `model_memory_limit` job configuration option sets the approximate maximum amount of memory resources required for analytical processing. When you create an {{anomaly-job}} in {{kib}}, it provides an estimate for this limit. The estimate is based on the analysis configuration details for the job and cardinality estimates, which are derived by running aggregations on the source indices as they exist at that specific point in time.

If you change the resources available on your {{ml}} nodes or make significant changes to the characteristics or cardinality of your data, the model memory requirements might also change. You can update the model memory limit for a job while it is closed. If you want to decrease the limit below the current model memory usage, however, you must clone and re-run the job.

::::{tip}
::::{tip}
You can view the current model size statistics with the [get {{anomaly-job}} stats](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-get-job-stats.html) and [get model snapshots](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-get-snapshot.html) APIs. You can also obtain a model memory limit estimate at any time by running the [estimate {{anomaly-jobs}} model memory API](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-estimate-model-memory.html). However, you must provide your own cardinality estimates.
::::


As a job approaches its model memory limit, the memory status is `soft_limit` and older models are more aggressively pruned to free up space. If you have categorization jobs, no further examples are stored. When a job exceeds its limit, the memory status is `hard_limit` and the job no longer models new entities. It is therefore important to have appropriate memory model limits for each job. If you reach the hard limit and are concerned about the missing data, ensure that you have adequate resources then clone and re-run the job with a larger model memory limit.


## 8. Pre-aggregate your data [pre-aggregate-data]
## 8. Pre-aggregate your data [pre-aggregate-data]

You can speed up the analysis by summarizing your data with aggregations.

Expand All @@ -100,22 +91,19 @@ In certain cases, you cannot do aggregations to increase performance. For exampl

Please consult [Aggregating data for faster performance](ml-configuring-aggregation.md) to learn more.


## 9. Optimize the results retention [results-retention]
## 9. Optimize the results retention [results-retention]

Set a results retention window to reduce the amount of results stored.

{{anomaly-detect-cap}} results are retained indefinitely by default. Results build up over time, and your result index may be quite large. A large results index is slow to query and takes up significant space on your cluster. Consider how long you wish to retain the results and set `results_retention_days` accordingly – for example, to 30 or 60 days – to avoid unnecessarily large result indices. Deleting old results does not affect the model behavior. You can change this setting for existing jobs.


## 10. Optimize the renormalization window [renormalization-window]
## 10. Optimize the renormalization window [renormalization-window]

Reduce the renormalization window to reduce processing workload.

When a new anomaly has a much higher score than any anomaly in the past, the anomaly scores are adjusted on a range from 0 to 100 based on the new data. This is called renormalization. It can mean rewriting a large number of documents in the results index. Renormalization happens for results from the last 30 days or 100 bucket spans (depending on which is the longer) by default. When you are working at scale, set `renormalization_window_days` to a lower value, so the workload is reduced. You can change this setting for existing jobs and changes will take effect after the job has been reopened.


## 11. Optimize the model snapshot retention [model-snapshot-retention]
## 11. Optimize the model snapshot retention [model-snapshot-retention]

Model snapshots are taken periodically, to ensure resilience in the event of a system failure and to allow you to manually revert to a specific point in time. These are stored in a compressed format in an internal index and kept according to the configured retention policy. Load is placed on the cluster when indexing a model snapshot and index size is increased as multiple snapshots are retained.

Expand All @@ -125,20 +113,17 @@ Also consider how long you wish to retain snapshots using `model_snapshot_retent

For more information, refer to [Model snapshots](https://www.elastic.co/guide/en/machine-learning/current/ml-model-snapshots.html).


## 12. Optimize your search queries [search-queries]
## 12. Optimize your search queries [search-queries]

If you are operating on a big scale, make sure that your {{dfeed}} query is as efficient as possible. There are different ways to write {{es}} queries and some of them are more efficient than others. Please consult [Tune for search speed](../../../deploy-manage/production-guidance/optimize-performance/search-speed.md) to learn more about {{es}} performance tuning.

You need to clone or recreate an existing job if you want to optimize its search query.


## 13. Consider using population analysis [population-analysis]
## 13. Consider using population analysis [population-analysis]

Population analysis is more memory efficient than individual analysis of each series. It builds a profile of what a "typical" entity does over a specified time period and then identifies when one is behaving abnormally compared to the population. Use population analysis for analyzing high cardinality fields if you expect that the entities of the population generally behave in the same way.


## 14. Reduce the cost of forecasting [forecasting]
## 14. Reduce the cost of forecasting [forecasting]

There are two main performance factors to consider when you create a forecast: indexing load and memory usage. Check the cluster monitoring data to learn the indexing rate and the memory usage.

Expand All @@ -147,4 +132,3 @@ Forecasting writes a new document to the result index for every forecasted eleme
To reduce indexing load, consider a shorter forecast duration and/or try to avoid concurrent forecast requests. Further performance gains can be achieved by reviewing the job configuration; for example by using a dedicated results index, increasing the bucket span and/or by having lower cardinality partitioning fields.

The memory usage of a forecast is restricted to 20 MB by default. From 7.9, you can extend this limit by setting `max_model_memory` to a higher value. The maximum value is 40% of the memory limit of the {{anomaly-job}} or 500 MB. If the forecast needs more memory than the provided value, it spools to disk. Forecasts that spool to disk generally run slower. If you need to speed up forecasts, increase the available memory for the forecast. Forecasts that would take more than 500 MB to run won’t start because this is the maximum limit of disk space that a forecast is allowed to use.

Loading

0 comments on commit 050bca0

Please sign in to comment.