logicalclocks · javierdlrm · Feb 15, 2024 · Feb 14, 2024 · Feb 14, 2024 · Feb 14, 2024
diff --git a/docs/admin/alert.md b/docs/admin/alert.md
@@ -34,7 +34,7 @@ button on the left side of the **email** row and fill out the form that pops up.
   CRAM-MD5, LOGIN or PLAIN.
 
 Optionally cluster wide Email alert receivers can be added in _Default receiver emails_.
-These receivers will be available to all users when they create event triggered [alerts](../../user_guides/fs/feature_group/advanced_data_validation/#setup-alerts).
+These receivers will be available to all users when they create event triggered [alerts](../../user_guides/fs/feature_group/data_validation_best_practices#setup-alerts).
 
 ### Step 3: Configure Slack Alerts
 Alerts can also be sent via Slack messages. To be able to send Slack messages you first need to configure
@@ -47,7 +47,7 @@ a Slack webhook. Click on the _Configure_ button on the left side of the **slack
 </figure>
 
 Optionally cluster wide Slack alert receivers can be added in _Slack channel/user_.
-These receivers will be available to all users when they create event triggered [alerts](../../user_guides/fs/feature_group/advanced_data_validation/#setup-alerts).
+These receivers will be available to all users when they create event triggered [alerts](../../user_guides/fs/feature_group/data_validation_best_practices/#setup-alerts).
 
 ### Step 4: Configure Pagerduty Alerts
 Pagerduty is another way you can send alerts from Hopsworks. Click on the _Configure_ button on the left side of 
@@ -93,7 +93,7 @@ global:
  ...
 ```
 
-To test the alerts by creating triggers from Jobs and Feature group validations see [Alerts](../../user_guides/fs/feature_group/advanced_data_validation/#setup-alerts).
+To test the alerts by creating triggers from Jobs and Feature group validations see [Alerts](../../user_guides/fs/feature_group/data_validation_best_practices/#setup-alerts).
 
 The yaml syntax in the UI is slightly different in that it does not allow double quotes (it will ignore the values but give no error). 
 Below is an example configuration, that can be used in the UI, with both email and slack receivers configured for system alerts.

diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-alerts.png b/docs/assets/images/guides/fs/feature_monitoring/fm-alerts.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-config-disable-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-config-disable-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-config-run-once-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-config-run-once-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-detection-plot.png b/docs/assets/images/guides/fs/feature_monitoring/fm-detection-plot.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-detection-windows.png b/docs/assets/images/guides/fs/feature_monitoring/fm-detection-windows.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-list-configs.png b/docs/assets/images/guides/fs/feature_monitoring/fm-list-configs.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-multiple-configs-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-multiple-configs-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-reference-plot.png b/docs/assets/images/guides/fs/feature_monitoring/fm-reference-plot.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-reference-windows.png b/docs/assets/images/guides/fs/feature_monitoring/fm-reference-windows.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-select-config-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-select-config-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-select-metric-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-select-metric-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-show-diff-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-show-diff-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-show-reference-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-show-reference-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-show-shifted-points-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-show-shifted-points-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-show-threshold-arrow.png b/docs/assets/images/guides/fs/feature_monitoring/fm-show-threshold-arrow.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-threshold-plot.png b/docs/assets/images/guides/fs/feature_monitoring/fm-threshold-plot.png
diff --git a/docs/assets/images/guides/fs/feature_monitoring/fm-types-of-windows.png b/docs/assets/images/guides/fs/feature_monitoring/fm-types-of-windows.png
diff --git a/docs/concepts/fs/feature_group/feature_monitoring.md b/docs/concepts/fs/feature_group/feature_monitoring.md
@@ -0,0 +1,20 @@
+Feature Monitoring complements data validation capabilities by allowing you to monitor your feature data after it has been ingested into the Feature Store.
+
+HSFS supports monitoring features on your Feature Group by:
+
+- transparently **computing statistics** on the whole or a subset of feature data defined by a detection window.
+- **comparing statistics** against a reference window of feature data, and **configuring thresholds** to identify anomalous data.
+- **configuring alerts** based on the statistics comparison results.
+
+## Scheduled Statistics
+
+After creating a Feature Group in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis. Statistics are computed on the whole or a subset of feature data (i.e., detection window) already inserted into the Feature Group.
+
+## Statistics Comparison
+
+In addition to scheduled statistics, you can enable the comparison of statistics against a reference subset of feature data (i.e., reference window) and define the criteria for this comparison including the statistics metric to compare and a threshold to identify anomalous values.
+
+!!! info "Feature Monitoring Guide"
+    More information can be found in the [Feature monitoring guide](../../../user_guides/fs/feature_monitoring/index.md).
+
+
diff --git a/docs/concepts/fs/feature_view/feature_monitoring.md b/docs/concepts/fs/feature_view/feature_monitoring.md
@@ -0,0 +1,20 @@
+Feature Monitoring complements data validation capabilities by allowing you to monitor your feature data once they have been ingested into the Feature Store.
+
+HSFS supports monitoring features on your Feature View by:
+
+- transparently **computing statistics** on the whole or a subset of feature data defined by a detection window.
+- **comparing statistics** against a reference window of feature data (e.g., training dataset), and **configuring thresholds** to identify anomalous data.
+- **configuring alerts** based on the statistics comparison results.
+
+## Scheduled Statistics
+
+After creating a Feature View in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis. Statistics are computed on the whole or a subset of feature data (i.e., detection window) using the Feature View query.
+
+## Statistics Comparison
+
+In addition to scheduled statistics, you can enable the comparison of statistics against a reference subset of feature data (i.e., reference window), typically a training dataset, and define the criteria for this comparison including the statistics metric to compare and a threshold to identify anomalous values.
+
+!!! info "Feature Monitoring Guide"
+    More information can be found in the [Feature monitoring guide](../../../user_guides/fs/feature_monitoring/index.md).
+
+
diff --git a/docs/css/custom.css b/docs/css/custom.css
@@ -1,58 +1,67 @@
 :root {
-  --md-primary-fg-color: #1EB382;
+  --md-primary-fg-color: #1eb382;
   --md-secondary-fg-color: #188a64;
   --md-tertiary-fg-color: #0d493550;
   --md-quaternary-fg-color: #fdfdfd;
   --border-radius-variable: 5px;
 }
 
 .md-footer__inner:not([hidden]) {
-  display: none
+  display: none;
 }
 
 /* Lex did stuff here */
-.svg_topnav{
+.svg_topnav {
   width: 12px;
   filter: invert(100);
 }
-.svg_topnav:hover{
+.svg_topnav:hover {
   width: 12px;
   filter: invert(10);
 }
 
-.md-header[data-md-state=shadow] {
+.md-header[data-md-state="shadow"] {
   box-shadow: 0 0 0 0;
 }
 
 .md-tabs__item:hover {
   background-color: var(--md-tertiary-fg-color);
   transition: background-color 450ms;
-
 }
 
-.md-sidebar__scrollwrap{
+.md-sidebar__scrollwrap {
   background-color: var(--md-quaternary-fg-color);
   padding: 15px 5px 5px 5px;
   border-radius: var(--border-radius-variable);
 }
 
-
-.image_logo_02{
-  width:450px;
+.image_logo_02 {
+  width: 450px;
 }
 
 /* End of Lex did stuff here */
 
+/* no-icon style for admonitions */
+.md-typeset .no-icon > .admonition-title::before,
+.md-typeset .no-icon > summary::before {
+  display: none;
+}
+.md-typeset .no-icon > :is(.admonition-title, summary) {
+  padding-left: 1rem;
+}
+/* end of no-icon style */
+
 .md-header__button.md-logo {
-  margin: .1rem;
-  padding: .1rem;
+  margin: 0.1rem;
+  padding: 0.1rem;
 }
 
-.md-header__button.md-logo img, .md-header__button.md-logo svg {
+.md-header__button.md-logo img,
+.md-header__button.md-logo svg {
   display: block;
   width: 1.8rem;
   height: 1.8rem;
-  fill: currentColor;
+  fill: rgba(43, 155, 70, 0.1);
 }
 
 .md-tabs {
@@ -63,7 +72,6 @@
   transition: background-color 250ms;
 }
 
-
 .wrapper {
   display: grid;
   grid-template-columns: repeat(4, 1fr);
@@ -72,9 +80,9 @@
 }
 
 .wrapper * {
-    border: 2px solid green;
-    text-align: center;
-    padding: 70px 0;
+  border: 2px solid green;
+  text-align: center;
+  padding: 70px 0;
 }
 
 .one {
@@ -107,13 +115,12 @@
   display: none !important;
 }
 
-
-@media screen and (max-width: 479px){
- .md-sidebar--primary, .md-sidebar {
-  z-index: 50 !important;
- }
- .md-logo {
-  visibility: hidden;
- }
-
-}
+@media screen and (max-width: 479px) {
+  .md-sidebar--primary,
+  .md-sidebar {
+    z-index: 50 !important;
+  }
+  .md-logo {
+    visibility: hidden;
+  }
+}
diff --git a/docs/user_guides/fs/feature_group/data_validation.md b/docs/user_guides/fs/feature_group/data_validation.md
@@ -64,7 +64,7 @@ In order to define and validate an expectation when writing to a Feature Group,
 
 - A Hopsworks project. If you don't have a project yet you can go to [managed.hopsworks.ai](https://managed.hopsworks.ai), signup with your email and create your first project.
 - An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md)
-- The [hopsworks python library](../../client_installation/index.md) installed in your client
+- The [Hopsworks Python library](https://pypi.org/project/hopsworks) installed in your client. See the [installation guide](../../client_installation/index.md).
 
 #### Connect your notebook to Hopsworks
 
@@ -174,7 +174,7 @@ That is all there is to it. Hopsworks will now automatically use your suite to v
 job, validation_report = fg.insert(df.head(5))
 ```
 
-As you can see, Hopsworks runs the validation in the client before attempting to insert the data. By default, Hopsworks will try to insert the data even if validation fails to prevent data loss. However it can be configured for production setup to be more restrictive, checkout the [data validation advanced guide](advanced_data_validation.md).
+As you can see, Hopsworks runs the validation in the client before attempting to insert the data. By default, Hopsworks will try to insert the data even if validation fails to prevent data loss. However it can be configured for production setup to be more restrictive, checkout the [data validation advanced guide](data_validation_advanced.md).
 
 !!!info
 	Note that once the Expectation Suite is attached to the Feature Group, any subsequent attempt to insert to this Feature Group will apply the Data Validation step even from a different client or in a scheduled job.
@@ -214,4 +214,4 @@ The integration between Hopsworks and Great Expectations makes it simple to add
 
 ## Going further
 
-If you wish to find out more about how to use the data validation API or best practices for development or production pipelines in Hopsworks, checkout the [advanced guide](advanced_data_validation.md).
+If you wish to find out more about how to use the data validation API or best practices for development or production pipelines in Hopsworks, checkout the [advanced guide](data_validation_advanced.md) and [best practices guide](data_validation_best_practices.md).
diff --git a/...feature_group/advanced_data_validation.md → ...feature_group/data_validation_advanced.md b/...feature_group/advanced_data_validation.md → ...feature_group/data_validation_advanced.md
@@ -1,6 +1,6 @@
 # Advanced Data Validation Options and Best Practices
 
-The introduction to data vaildation guide can be found [here](data_validation.md). The notebook example to get started with Data Validation in Hopsworks can be found [here](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/integrations/great_expectations/fraud_batch_data_validation.ipynb).
+The introduction to the data validation guide can be found [here](data_validation.md). The notebook example to get started with Data Validation in Hopsworks can be found [here](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/integrations/great_expectations/fraud_batch_data_validation.ipynb).
 
 ## Data Validation Configuration Options in Hopsworks
 
@@ -55,7 +55,7 @@ The one constant in life is change. If you need to add, remove or edit an expect
 
 Go to the Feature Group edit page, in the expectation section. You can click on the expectation you want to edit and edit the json configuration. Check out Great Expectations documentation if you need more information on a particular expectation.
 
-### In Hopsworks Python Client
+#### In Hopsworks Python Client
 
 There are several way to edit an Expectation in the python client. You can use Great Expectations API or directly go through Hopsworks. In the latter case, if you want to edit or remove an expectation, you will need the Hopsworks expectation ID. It can be found in the UI or in the meta field of an expectation. Note that you must have inserted data in the FG and attached the expectation suite to enable the Expectation API.
 
@@ -122,7 +122,7 @@ The boilerplate of uploading report on insertion is taken care of by hopsworks,
 fg.save_validation_report(ge_report)
 ```
 
-#### Monitor and Fetch Validation Reports
+### Monitor and Fetch Validation Reports
 
 A summary of uploaded reports will then be available via an API call or in the Hopsworks UI enabling easy monitoring. For in-depth analysis, it is possible to download the complete report from the UI.
 
@@ -173,116 +173,3 @@ ge_report = ge_df.validate()
 ```
 
 Note that you should always use an expectation suite that has been saved to Hopsworks if you intend to upload the associated validation report.
-
-## Best Practices
-
-Below is a set of recommendations and code snippets to help our users follow best practices when it comes to integrating a data validation step in your feature engineering pipelines. Rather than being prescriptive, we want to showcase how the API and configuration options can help adapt validation to your use-case.
-
-### Development
-
-Data validation is generally considered to be a production-only feature and as such is often only setup once a project has reached the end of the development phase. At Hopsworks, we think there is a lot of value in setting up validation during early development. That's why we made it quick to get started and ensured that by default data validation is never an obstacle to inserting data.
-
-#### Validate Early
-
-As often with data validation, the best piece of advice is to set it up early in your development process. Use this phase to build a history you can then use when it becomes time to set quality requirements for a project in production. We made a code snippet to help you get started quickly:
-
-```python3
-# Load sample data. Replace it with your own!
-my_data_df = pd.read_csv("https://repo.hops.works/master/hopsworks-tutorials/data/card_fraud_data/credit_cards.csv")
-
-# Use Great Expectation profiler (ignore deprecation warning)
-expectation_suite_profiled, validation_report = ge.from_pandas(my_data_df).profile(profiler=ge.profile.BasicSuiteBuilderProfiler)
-
-# Create a Feature Group on hopsworks with an expectation suite attached. Don't forget to change the primary key!
-my_validated_data_fg = fs.get_or_create_feature_group(
-    name="my_validated_data_fg",
-    version=1,
-    description="My data",
-    primary_key=['cc_num'],
-    expectation_suite=expectation_suite_profiled)
-```
-
-Any data you insert in the Feature Group from now will be validated and a report will be uploaded to Hopsworks.
-
-```python3
-# Insert and validate your data
-insert_job, validation_report = my_validated_data_fg.insert(my_data_df)
-```
-
-Great Expectations profiler can inspect your data to build a standard Expectation Suite. You can attach this Expectation Suite directly when creating your Feature Group to make sure every piece of data finding its way in Hopsworks gets validated. Hopsworks will default to its `"ALWAYS"` ingestion policy, meaning data are ingested whether validation succeeds or not. This way data validation is not a barrier, just a monitoring tool.
-
-#### Identify Unreliable Features
-
-Once you setup data validation, every insertion will upload a validation report to Hopsworks. Identifying Features which often have null values or wild statistical variations can help detecting unreliable Features that need refinements or should be avoided. Here are a few expectations you might find useful:
-
-- `expect_column_values_to_not_be_null`
-- `expect_column_(min/max/mean/stdev)_to_be_between`
-- `expect_column_values_to_be_unique`
-
-#### Get the stakeholders involved
-
-Hopsworks UI helps involve every project stakeholder by enabling both setting and monitoring of data quality requirements. No coding skills needed! You can monitor data quality requirements by checkint out the validation reports and results on the Feature Group page.
-
-If you need to set or edit the existing requirements, you can go on the Feature Group edit page. The Expectation suite section allows you to edit individual expectations and set success parameters that match ever changing business requirements.
-
-### Production
-
-Models in production require high-quality data to make accurate predictions for your customers. Hopsworks can use your Expectation Suite as a gatekeeper to make it simple to prevent low-quality data to make its way into production. Below are some simple tips and snippets to make the most of your data validation when your project is ready to enter its production phase.
-
-#### Be Strict in Production
-
-Whether you use an existing or create a new (recommended) Feature Group for production, we recommend you set the validation ingestion policy of your Expectation Suite to `"STRICT"`.
-
-```python3
-fg_prod.save_expectation_suite(
-    my_suite,
-    validation_ingestion_policy="STRICT")
-```
-
-In this setup, Hopsworks will abort inserting a DataFrame that does not successfully fullfill all expectations in the attached Expectation Suite. This ensures data quality standards are upheld for every insertion and provide downstream users with strong guarantees.
-
-#### Avoid Data Loss on materialization jobs
-
-Aborting insertions of DataFrames which do not satisfy the data quality standards can lead to data loss in your materialization job. To avoid such loss we recommend creating a duplicate Feature Group with the same Expectation Suite in `"ALWAYS"` mode which will hold the rejected data.
-
-```python3
-job, report = fg_prod.insert(df)
-
-if report["success"] is False:
-    job, report = fg_rejected.insert(df)
-```
-
-#### Take Advantage of the Validation History
-
-You can easily retrieve the validation history of a specific expectation to export it to your favourite visualisation tool. You can filter on time and on whether insertion was successful or not
-
-```python3
-validation_history = fg.get_validation_history(
-    expectation_id=my_id,
-    filters=["REJECTED", "UNKNOWN"],
-    ge_type=False
-)
-
-timeseries = pd.DataFrame(
-    {
-        "observed_value": [res.result["observed_value"] for res in validation_histoy]],
-        "validation_time": [res.validation_time for res in validation_history]
-    }
-)
-
-# export to your preferred Dashboard
-```
-
-#### Setup Alerts
-
-While checking your feature engineering pipeline executed properly in the morning can be good enough in the development phase, it won't make the cut for demanding production use-cases. In Hopsworks, you can setup alerts if ingestion fails or succeeds.
-
-First you will need to configure your preferred communication endpoint: slack, email or pagerduty. Check out [this page](../../../admin/alert.md) for more information on how to set it up. A typical use-case would be to add an alert on ingestion success to a Feature Group you created to hold data that failed validation. Here is a quick walkthrough:
-
-1. Go the Feature Group page in the UI
-2. Scroll down and click on the `Add an alert` button.
-3. Choose the trigger, receiver and severity and click save.
-
-## Conclusion
-
-Hopsworks completes Great Expectation by automatically running the validation, persisting the reports along your data and allowing you to monitor data quality in its UI. How you decide to make use of these tools depends on your application and requirements. Whether in development or in production, real-time or batch, we think there is configuration that will work for your team. Check out our [quick hands-on tutorial](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/integrations/great_expectations/fraud_batch_data_validation.ipynb) to start applying what you learned so far.