Merge remote-tracking branch 'upstream/main' into FSTORE-1008-java-engine

davitbzh · davitbzh · commit 289e2dd45a13 · 2025-01-21T22:18:24.000+01:00
diff --git a/docs/assets/images/admin/audit/audit-log-vars.png b/docs/assets/images/admin/audit/audit-log-vars.png
diff --git a/docs/assets/images/guides/fs/storage_connector/s3_creation.png b/docs/assets/images/guides/fs/storage_connector/s3_creation.png
diff --git a/docs/setup_installation/admin/audit/audit-logs.md b/docs/setup_installation/admin/audit/audit-logs.md
@@ -28,8 +28,7 @@ To edit a configuration variable, you can click on the edit button (:material-pe
 
     | Name                  | Description                                                                                                                                                                                             |
     | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-    | audit_log_count       | the number of files to keep when rotating logs (java.util.logging.FileHandler.count)                                                                                                                    |
-    | audit_log_dir         | the path where audit logs are saved                                                                                                                                                                     |
+    | audit_log_count       | the number of files to keep when rotating logs (java.util.logging.FileHandler count)                                                                                                                    |                                                                                                                                                                  |
     | audit_log_file_format | log file name pattern. (java.util.logging.FileHandler.pattern)                                                                                                                                          |
     | audit_log_file_type   | the output format of the log file. Can be one of java.util.logging.SimpleFormatter (default), io.hops.hopsworks.audit.helper.JSONLogFormatter, or io.hops.hopsworks.audit.helper.HtmlLogFormatter.      |
     | audit_log_size_limit  | the maximum number of bytes to write to any one file. (java.util.logging.FileHandler.limit)                                                                                                             |
@@ -40,7 +39,7 @@ To edit a configuration variable, you can click on the edit button (:material-pe
  
 ## Step 2: Access the Logs
  
-To access the audit logs, SSH into the **head node** of your Hopsworks cluster and navigate to the path set in the _audit\_log\_dir_ configuration variable.
+To access the audit logs, SSH into the **instance pod** of your Hopsworks cluster and navigate to the path ```/opt/payara/appserver/glassfish/nodes/<node name>/<instance name>/logs/audit```.
  
 Audit logs follow the format set in the _audit\_log\_file\_type_ configuration variable.
 
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
@@ -24,19 +24,19 @@ This is a batch use case variant of the fraud tutorial, it will give you a high
 
 | Notebooks   |                                      |
 | ----------- | ------------------------------------ |
-| 1. How to load, engineer and create feature groups | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/fraud_batch/1_fraud_batch_feature_pipeline.ipynb){:target="_blank"}        |
-| 2. How to create training datasets                 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/fraud_batch/2_fraud_batch_training_pipeline.ipynb){:target="_blank"} |
-| 3. How to train a model from the feature store     | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/fraud_batch/3_fraud_batch_inference.ipynb){:target="_blank"}        |
+| 1. [How to load, engineer and create feature groups](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/fraud_batch/1_fraud_batch_feature_pipeline.ipynb){:target="_blank"}        |
+| 2. [How to create training datasets](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/fraud_batch/2_fraud_batch_training_pipeline.ipynb){:target="_blank"} |
+| 3. [How to train a model from the feature store](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/fraud_batch/3_fraud_batch_inference.ipynb){:target="_blank"}        |
 
 ### Online
 This is a online use case variant of the fraud tutorial, it is similar to the batch use case, however, in this tutorial you will get introduced to the usage of Feature Groups which are kept in online storage, and how to access single feature vectors from the online storage
 at low latency. Additionally, the model will be deployed as a model serving instance, to provide a REST endpoint for real time serving.
 
 | Notebooks   |                                      |
 | ----------- | ------------------------------------ |
-| 1. How to load, engineer and create feature groups | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/fraud_online/1_fraud_online_feature_pipeline.ipynb){:target="_blank"}        |
-| 2. How to create training datasets                 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/fraud_online/2_fraud_online_training_pipeline.ipynb){:target="_blank"} |
-| 3. How to train a model from the feature store and deploying it as a serving instance together with the online feature store | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/fraud_online/3_fraud_online_inference_pipeline.ipynb){:target="_blank"}        |
+| 1. [How to load, engineer and create feature groups](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/real-time-ai-systems/fraud_online/1_fraud_online_feature_pipeline.ipynb){:target="_blank"}        |
+| 2. [How to create training datasets](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/real-time-ai-systems/fraud_online/2_fraud_online_training_pipeline.ipynb){:target="_blank"} |
+| 3. [How to train a model from the feature store and deploying it as a serving instance together with the online feature store](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/real-time-ai-systems/fraud_online/3_fraud_online_inference_pipeline.ipynb){:target="_blank"}        |
 
 ## Churn Tutorial
 
@@ -45,17 +45,9 @@ at low latency. Additionally, the model will be deployed as a model serving inst
 
 | Notebooks   |                                      |
 | ----------- | ------------------------------------ |
-| 1. How to load, engineer and create feature groups | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/churn/1_churn_feature_pipeline.ipynb){:target="_blank"}        |
-| 2. How to create training datasets                 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/churn/2_churn_training_pipeline.ipynb){:target="_blank"} |
-| 3. How to train a model from the feature store and deploying it as a serving instance together with the online feature store | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/churn/3_churn_batch_inference.ipynb){:target="_blank"}        |
-
-## Iris Tutorial
-
-In this tutorial you will learn how to create an online prediction service for the Iris flower prediction problem.
-
-| Notebooks   |                                      |
-| ----------- | ------------------------------------ |
-| 1. All-in-one notebook, showing how to create the needed feature groups, train the model and deploy it as a serving instance | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/iris/iris_tutorial.ipynb){:target="_blank"}        |
+| 1. How to load, engineer and create feature groups | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/churn/1_churn_feature_pipeline.ipynb){:target="_blank"}        |
+| 2. How to create training datasets                 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/churn/2_churn_training_pipeline.ipynb){:target="_blank"} |
+| 3. How to train a model from the feature store and deploying it as a serving instance together with the online feature store | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/churn/3_churn_batch_inference.ipynb){:target="_blank"}        |
 
 ## Integration Tutorials
 
diff --git a/docs/user_guides/fs/feature_group/feature_monitoring.md b/docs/user_guides/fs/feature_group/feature_monitoring.md
@@ -9,7 +9,7 @@ Before continuing with this guide, see the [Feature monitoring guide](../feature
 
 ## Code
 
-In this section, we show you how to setup feature monitoring in a Feature Group using the ==Hopsworks Python library==. Alternatively, you can get started quickly by running our [tutorial for feature monitoring](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/integrations/feature-monitoring/feature-monitoring.ipynb).
+In this section, we show you how to setup feature monitoring in a Feature Group using the ==Hopsworks Python library==. Alternatively, you can get started quickly by running our [tutorial for feature monitoring](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/api_examples/feature_monitoring.ipynb).
 
 First, checkout the pre-requisite and Hopsworks setup to follow the guide below. Create a project, install the [Hopsworks Python library](https://pypi.org/project/hopsworks) in your environment, connect via the generated API key. The second step is to start a new configuration for feature monitoring. 
 
diff --git a/docs/user_guides/fs/feature_monitoring/feature_monitoring_advanced.md b/docs/user_guides/fs/feature_monitoring/feature_monitoring_advanced.md
@@ -1,6 +1,6 @@
 # Advanced guide
 
-An introduction to Feature Monitoring can be found in the guides for [Feature Groups](../feature_group/feature_monitoring.md) and [Feature Views](../feature_view/feature_monitoring.md). In addition, you can get started quickly by running our [tutorial for feature monitoring](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/integrations/feature-monitoring/feature-monitoring.ipynb).
+An introduction to Feature Monitoring can be found in the guides for [Feature Groups](../feature_group/feature_monitoring.md) and [Feature Views](../feature_view/feature_monitoring.md). In addition, you can get started quickly by running our [tutorial for feature monitoring](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/api_examples/feature_monitoring.ipynb).
 
 ## Retrieve feature monitoring configurations
 
diff --git a/docs/user_guides/fs/feature_view/feature_monitoring.md b/docs/user_guides/fs/feature_view/feature_monitoring.md
@@ -9,7 +9,7 @@ Before continuing with this guide, see the [Feature monitoring guide](../feature
 
 ## Code
 
-In this section, we show you how to setup feature monitoring in a Feature View using the ==Hopsworks Python library==. Alternatively, you can get started quickly by running our [tutorial for feature monitoring](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/integrations/feature-monitoring/feature-monitoring.ipynb).
+In this section, we show you how to setup feature monitoring in a Feature View using the ==Hopsworks Python library==. Alternatively, you can get started quickly by running our [tutorial for feature monitoring](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/api_examples/feature_monitoring.ipynb).
 
 First, checkout the pre-requisite and Hopsworks setup to follow the guide below. Create a project, install the [Hopsworks Python library](https://pypi.org/project/hopsworks) in your environment and connect via the generated API key. The second step is to start a new configuration for feature monitoring. 
 
diff --git a/docs/user_guides/fs/feature_view/overview.md b/docs/user_guides/fs/feature_view/overview.md
@@ -44,7 +44,7 @@ If you want to understand more about the concept of feature view, you can refer
                                             .build();
     ```
 
-You can refer to [query](./query.md) and [transformation function](./model-dependent-transformations.md) for creating `query` and `transformation_function`. To see a full example of how to create a feature view, you can read [this notebook](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/fraud_batch/2_feature_view_creation.ipynb).
+You can refer to [query](./query.md) and [transformation function](./model-dependent-transformations.md) for creating `query` and `transformation_function`. To see a full example of how to create a feature view, you can read [this notebook](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/fraud_batch/2_fraud_batch_training_pipeline.ipynb).
 
 ## Retrieval
 Once you have created a feature view, you can retrieve it by its name and version.
diff --git a/docs/user_guides/fs/feature_view/training-data.md b/docs/user_guides/fs/feature_view/training-data.md
@@ -2,7 +2,7 @@
 
 Training data can be created from the feature view and used by different ML libraries for training different models.
 
-You can read [training data concepts](../../../concepts/fs/feature_view/offline_api.md) for more details. To see a full example of how to create training data, you can read [this notebook](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/fraud_batch/2_feature_view_creation.ipynb).
+You can read [training data concepts](../../../concepts/fs/feature_view/offline_api.md) for more details. To see a full example of how to create training data, you can read [this notebook](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/fraud_batch/2_fraud_batch_training_pipeline.ipynb).
 
 For Python-clients, handling small or moderately-sized data, we recommend enabling the [ArrowFlight Server with DuckDB](../../../setup_installation/common/arrow_flight_duckdb.md) service,
 which will provide significant speedups over Spark/Hive for reading and creating in-memory training datasets.
@@ -29,7 +29,7 @@ print(job.id) # get the job's id and view the job status in the UI
 ### Extra filters
 Sometimes data scientists need to train different models using subsets of a dataset. For example, there can be different models for different countries, seasons, and different groups. One way is to create different feature views for training different models. Another way is to add extra filters on top of the feature view when creating training data.
 
-In the [transaction fraud example](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/fraud_batch/1_feature_groups.ipynb), there are different transaction categories, for example: "Health/Beauty", "Restaurant/Cafeteria", "Holliday/Travel" etc. Examples below show how to create training data for different transaction categories.
+In the [transaction fraud example](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/batch-ai-systems/fraud_batch/1_fraud_batch_feature_pipeline.ipynb), there are different transaction categories, for example: "Health/Beauty", "Restaurant/Cafeteria", "Holliday/Travel" etc. Examples below show how to create training data for different transaction categories.
 ```python
 # Create a training dataset for Health/Beauty
 df_health = feature_view.training_data(
diff --git a/docs/user_guides/fs/storage_connector/creation/s3.md b/docs/user_guides/fs/storage_connector/creation/s3.md
@@ -17,6 +17,7 @@ When you're finished, you'll be able to read files using Spark through HSFS APIs
 Before you begin this guide you'll need to retrieve the following information from your AWS S3 account and bucket:
 
 - **Bucket:** You will need a S3 bucket that you have access to. The bucket is identified by its name.
+- **Path (Optional):** If needed, a path can be defined to ensure that all operations are restricted to a specific location within the bucket.
 - **Region (Optional):** You will need an S3 region to have complete control over data when managing the feature group that relies on this storage connector. The region is identified by its code.
 - **Authentication Method:** You can authenticate using Access Key/Secret, or use IAM roles. If you want to use an IAM role it either needs to be attached to the entire Hopsworks cluster or Hopsworks needs to be able to assume the role. See [IAM role documentation](../../../../setup_installation/admin/roleChaining.md) for more information.
 - **Server Side Encryption details:** If your bucket has server side encryption (SSE) enabled, make sure you know which algorithm it is using (AES256 or SSE-KMS). If you are using SSE-KMS, you need the resource ARN of the managed key.
diff --git a/docs/user_guides/fs/vector_similarity_search.md b/docs/user_guides/fs/vector_similarity_search.md
@@ -108,4 +108,4 @@ There are 2 types of online feature stores in Hopsworks: online store (RonDB) an
 Create a new index per feature group to optimize retrieval performance.
 
 # Next step
-Explore the [news search example](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/api_examples/hsfs/knn_search/news-search-knn.ipynb), demonstrating how to use Hopsworks for implementing a news search application using natural language in the application. Additionally, you can see the application of querying similar embeddings with additional features in this [news rank example](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/api_examples/hsfs/knn_search/news-search-rank-view.ipynb).
+Explore the [news search example](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/api_examples/vector_similarity_search/1_feature_group_embeddings_api.ipynb), demonstrating how to use Hopsworks for implementing a news search application using natural language in the application. Additionally, you can see the application of querying similar embeddings with additional features in this [news rank example](https://github.com/logicalclocks/hopsworks-tutorials/blob/master/api_examples/vector_similarity_search/2_feature_view_embeddings_api.ipynb).