Skip to content

Commit ee71502

Browse files
committed
Fix link checks and corresponding content.
1 parent cc126d1 commit ee71502

File tree

13 files changed

+76
-141
lines changed

13 files changed

+76
-141
lines changed

docs/setup_installation/admin/roleChaining.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ Before you begin this guide you'll need the following:
1313
- Administrator account on a Hopsworks cluster.
1414

1515
### Step 1: Create an instance profile role
16-
To use role chaining the head node need to be able to impersonate the roles you want to be linked to your project. For this you need to create an instance profile with assume role permissions and attach it to your head node. For more details about the creation of instance profile see the [aws documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). If running in [managed.hopsworks.ai](https://managed.hopsworks.ai) you can also refer to our [getting started guide](../setup_installation/aws/getting_started.md#step-3-creating-instance-profile).
16+
To use role chaining the head node need to be able to impersonate the roles you want to be linked to your project. For this you need to create an instance profile with assume role permissions and attach it to your head node. For more details about the creation of instance profile see the [aws documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html).
17+
1718

1819
!!!note
1920
To ensure that the Hopsworks users can't use the head node instance profile and impersonate the roles by their own means, you need to ensure that they can't execute code on the head node. This means having all jobs running on worker nodes and using EKS to run jupyter notebooks.
@@ -75,7 +76,7 @@ Add mappings by clicking on *New role chaining*. Enter the project name. Select
7576
<figcaption>Create Role Chaining</figcaption>
7677
</figure>
7778

78-
Project member can now create connectors using *temporary credentials* to assume the role you configured. More detail about using temporary credentials can be found [here](../user_guides/fs/storage_connector/creation/s3.md#temporary-credentials).
79+
Project member can now create connectors using *temporary credentials* to assume the role you configured. More detail about using temporary credentials can be found [here](../../user_guides/fs/storage_connector/creation/s3.md#temporary-credentials).
7980

8081
Project member can see the list of role they can assume by going the _Project Settings_ -> [Assuming IAM Roles](../../../user_guides/projects/iam_role/iam_role_chaining) page.
8182

docs/setup_installation/admin/user.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ it securely to the user.
8787
### Step 5: Reset user password
8888

8989
In the case where a user loses her/his password and can not recover it with the
90-
[password recovery](../user_guides/projects/auth/recovery.md), an administrator can reset it for them.
90+
[password recovery](../../user_guides/projects/auth/recovery.md), an administrator can reset it for them.
9191

9292
On the bottom of the _Users_ page click on the _Reset a user password_ link. A popup window with a dropdown for
9393
searching users by name or email will open. Find the user and click on _Reset new password_.

docs/setup_installation/aws/instance_profile_permissions.md

-116
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# ArrowFlight Server with DuckDB
2+
By default, Hopsworks uses big data technologies (Spark or Hive) to create training data and read data for Python clients.
3+
This is great for large datasets, but for small or moderately sized datasets (think of the size of data that would fit in a Pandas
4+
DataFrame in your local Python environment), the overhead of starting a Spark or Hive job and doing distributed data processing can be significant.
5+
6+
ArrowFlight Server with DuckDB significantly reduces the time that Python clients need to read feature groups
7+
and batch inference data from the Feature Store, as well as creating moderately-sized in-memory training datasets.
8+
9+
When the service is enabled, clients will automatically use it for the following operations:
10+
11+
- [reading Feature Groups](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#read)
12+
- [reading Queries](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read)
13+
- [reading Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data)
14+
- [creating In-Memory Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#training_data)
15+
- [reading Batch Inference Data](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_batch_data)
16+
17+
For larger datasets, clients can still make use of the Spark/Hive backend by explicitly setting
18+
`read_options={"use_hive": True}`.
19+
20+
## Service configuration
21+
22+
!!! note
23+
Supported only on AWS at the moment.
24+
25+
!!! note
26+
Make sure that your cross account role has the load balancer permissions as described in [here](../../aws/restrictive_permissions/#load-balancers-permissions-for-external-access), otherwise you have to create and manage the load balancer yourself.
27+
28+
The ArrowFlight Server is co-located with RonDB in the Hopsworks cluster.
29+
If the ArrowFlight Server is activated, RonDB and ArrowFlight Server can each use up to 50%
30+
of the available resources on the node, so they can co-exist without impacting each other.
31+
Just like RonDB, the ArrowFlight Server can be replicated across multiple nodes to serve more clients at lower latency.
32+
To guarantee high performance, each individual ArrowFlight Server instance processes client requests sequentially.
33+
Requests will be queued for up to 10 minutes before they are rejected.
34+
35+
<p align="center">
36+
<figure>
37+
<img style="border: 1px solid #000" src="../../../assets/images/setup_installation/managed/common/arrowflight_rondb.png" alt="Configure RonDB">
38+
<figcaption>Activate ArrowFlight Server with DuckDB on a RonDB cluster</figcaption>
39+
</figure>
40+
</p>
41+
42+
To deploy ArrowFlight Server on a cluster:
43+
44+
1. Select "RonDB cluster"
45+
2. Select an instance type with at least 16GB of memory and 4 cores. (*)
46+
3. Tick the checkbox `Enable ArrowFlight Server`.
47+
48+
(*) The service should have at least the 2x the amount of memory available that a typical Python client would have.
49+
Because RonDB and ArrowFlight Server share the same node we recommend selecting an instance type with at least 4x the
50+
client memory. For example, if the service serves Python clients with typically 4GB of memory,
51+
an instance with at least 16GB of memory should be selected.
52+
An instance with 16GB of memory will be able to read feature groups and training datasets of up to 10-100M rows,
53+
depending on the number of columns and size of the features (~2GB in parquet). The same instance will be able to create
54+
point-in-time correct training datasets with 1-10M rows, also depending on the number and the size of the features.
55+
Larger instances are able to handle larger datasets. The numbers scale roughly linearly with the instance size.
56+

docs/setup_installation/on_prem/external_kafka_cluster.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ This guide will cover how to configure an Hopsworks cluster to leverage an exter
1010

1111
## Configure the external Kafka cluster integration
1212

13-
To enable the integration with an external Kafka cluster, you should set the `enable_bring_your_own_kafka` [configuration option](../../admin/variables.md) to `true`.
13+
To enable the integration with an external Kafka cluster, you should set the `enable_bring_your_own_kafka` [configuration option](../admin/variables.md) to `true`.
1414
This can also be achieved in the cluster definition by setting the following attribute:
1515

1616
```
@@ -64,4 +64,4 @@ As mentioned above, when configuring Hopsworks to use an external Kafka cluster,
6464

6565
Users should create a [Kafka storage connector](../../user_guides/fs/storage_connector/creation/kafka.md) named `kafka_connector` which is going to be used by the feature store clients to configure the necessary Kafka producers to send data.
6666
The configuration is done for each project to ensure its members have the necessary authentication/authorization.
67-
If the storage connector is not found in the project, default values referring to Hopsworks managed Kafka will be used.
67+
If the storage connector is not found in the project, default values referring to Hopsworks managed Kafka will be used.

docs/user_guides/fs/feature_group/data_validation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ First checkout the pre-requisite and Hopsworks setup to follow the guide below.
6363
In order to define and validate an expectation when writing to a Feature Group, you will need:
6464

6565
- A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project.
66-
- An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md)
66+
- An API key, you can get one by going to "Account Settings" on [app.hopsworks.ai](https://app.hopsworks.ai).
6767
- The [Hopsworks Python library](https://pypi.org/project/hopsworks) installed in your client. See the [installation guide](../../client_installation/index.md).
6868

6969
#### Connect your notebook to Hopsworks

docs/user_guides/fs/feature_group/data_validation_best_practices.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ timeseries = pd.DataFrame(
101101

102102
While checking your feature engineering pipeline executed properly in the morning can be good enough in the development phase, it won't make the cut for demanding production use-cases. In Hopsworks, you can setup alerts if ingestion fails or succeeds.
103103

104-
First you will need to configure your preferred communication endpoint: slack, email or pagerduty. Check out [this page](../../../admin/alert.md) for more information on how to set it up. A typical use-case would be to add an alert on ingestion success to a Feature Group you created to hold data that failed validation. Here is a quick walkthrough:
104+
First you will need to configure your preferred communication endpoint: slack, email or pagerduty. Check out [this page](../../../setup_installation/admin/alert.md) for more information on how to set it up. A typical use-case would be to add an alert on ingestion success to a Feature Group you created to hold data that failed validation. Here is a quick walkthrough:
105105

106106
1. Go the Feature Group page in the UI
107107
2. Scroll down and click on the `Add an alert` button.

docs/user_guides/fs/feature_group/feature_monitoring.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ After that, you can optionally define a detection window of data to compute stat
2020
In order to setup feature monitoring for a Feature Group, you will need:
2121

2222
- A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project.
23-
- An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md)
23+
- An API key, you can get one by going to "Account Settings" on [app.hopsworks.ai](https://app.hopsworks.ai).
2424
- The Hopsworks Python library installed in your client. See the [installation guide](../../client_installation/index.md).
2525
- A Feature Group
2626

docs/user_guides/fs/feature_monitoring/index.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ in Hopsworks and enable the user to visualise the temporal evolution of statisti
1212
- **Statistics Comparison**: Enabled only for individual features, this variant allows the user to schedule the statistics computation on both a _detection_ and a _reference window_. By providing information about how to compare those statistics, you can setup alerts to quickly detect critical change in the data. For more details, see the [Statistics comparison guide](statistics_comparison.md).
1313

1414
!!! important
15-
To enable feature monitoring in Hopsworks, you need to set the `enable_feature_monitoring` [configuration option](../../../admin/variables.md) to `true`.
15+
To enable feature monitoring in Hopsworks, you need to set the `enable_feature_monitoring` [configuration option](../../../setup_installation/admin/variables.md) to `true`.
1616
This can also be achieved in the cluster definition by setting the following attribute:
1717

1818
```
@@ -42,9 +42,9 @@ Hopsworks provides an interactive graph to make the exploration of statistics an
4242

4343
## Alerting
4444

45-
Moreover, feature monitoring integrates with the Hopsworks built-in system for [alerts](../../../admin/alert.md), enabling you to setup alerts that will notify you as soon as shift is detected in your feature values. You can setup alerts for feature monitoring at a Feature Group, Feature View, and project level.
45+
Moreover, feature monitoring integrates with the Hopsworks built-in system for [alerts](../../../setup_installation/admin/alert.md), enabling you to setup alerts that will notify you as soon as shift is detected in your feature values. You can setup alerts for feature monitoring at a Feature Group, Feature View, and project level.
4646

4747
!!! tip "Select the correct trigger"
4848
When configuring alerts for feature monitoring, make sure you select the `feature monitoring-shift detected` or `feature monitoring-shift undetected` trigger.
4949

50-
![Feature monitoring alerts](../../../assets/images/guides/fs/feature_monitoring/fm-alerts.png)
50+
![Feature monitoring alerts](../../../assets/images/guides/fs/feature_monitoring/fm-alerts.png)

docs/user_guides/fs/feature_view/feature_monitoring.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ After that, you can optionally define a detection window of data to compute stat
2020
In order to setup feature monitoring for a Feature View, you will need:
2121

2222
- A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project.
23-
- An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md)
23+
- An API key, you can get one by going to "Account Settings" on [app.hopsworks.ai](https://app.hopsworks.ai).
2424
- The [Hopsworks Python library](https://pypi.org/project/hopsworks) installed in your client. See the [installation guide](../../client_installation/index.md).
2525
- A Feature View
2626
- A Training Dataset

docs/user_guides/fs/storage_connector/creation/redshift.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Before you begin this guide you'll need to retrieve the following information fr
2222
- **Database port:** The port of the cluster. Defaults to 5349.
2323
- **Authentication method:** There are three options available for authenticating with the Redshift cluster. The first option is to configure a username and a password.
2424
The second option is to configure an IAM role. With IAM roles, Jobs or notebooks launched on Hopsworks do not need to explicitly authenticate with Redshift, as the HSFS library will transparently use the IAM role to acquire a temporary credential to authenticate the specified user.
25-
Read more about IAM roles in our [AWS credentials pass-through guide](../../../../admin/roleChaining.md). Lastly,
25+
Read more about IAM roles in our [AWS credentials pass-through guide](../../../../setup_installation/admin/roleChaining.md). Lastly,
2626
option `Instance Role` will use the default ARN Role configured for the cluster instance.
2727

2828
## Creation in the UI
@@ -62,7 +62,7 @@ Enter the details for your Redshift connector. Start by giving it a **name** and
6262
By default, the session duration that the role will be assumed for is 1 hour or 3600 seconds.
6363
This means if you want to use the storage connector for example to [read or create an external Feature Group from Redshift](../usage.md##creating-an-external-feature-group), the operation cannot take longer than one hour.
6464

65-
Your administrator can change the default session duration for AWS storage connectors, by first [increasing the max session duration of the IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html#id_roles_use_view-role-max-session) that you are assuming. And then changing the `fs_storage_connector_session_duration` [configuration property](../../../../admin/variables.md) to the appropriate value in seconds.
65+
Your administrator can change the default session duration for AWS storage connectors, by first [increasing the max session duration of the IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html#id_roles_use_view-role-max-session) that you are assuming. And then changing the `fs_storage_connector_session_duration` [configuration property](../../../../setup_installation/admin/variables.md) to the appropriate value in seconds.
6666

6767
### Step 3: Upload the Redshift database driver (optional)
6868

@@ -106,4 +106,4 @@ file, you can select it using the "From Project" option. To upload the jar file
106106

107107
## Next Steps
108108

109-
Move on to the [usage guide for storage connectors](../usage.md) to see how you can use your newly created Redshift connector.
109+
Move on to the [usage guide for storage connectors](../usage.md) to see how you can use your newly created Redshift connector.

0 commit comments

Comments
 (0)