Skip to content

Commit 56fc743

Browse files
authored
New documentation structure for 3.0 (logicalclocks#14)
1 parent 6194faf commit 56fc743

File tree

688 files changed

+51111
-278
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

688 files changed

+51111
-278
lines changed

.github/workflows/mkdocs-main.yml

-33
This file was deleted.

docs/admin/alert.md

+109
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Configure Alerts
2+
3+
## Introduction
4+
Alerts are sent from Hopsworks using Prometheus'
5+
[Alert manager](https://prometheus.io/docs/alerting/latest/alertmanager/).
6+
In order to send alerts we first need to configure the _Alert manager_.
7+
8+
## Prerequisites
9+
Administrator account on a Hopsworks cluster.
10+
11+
### Step 1: Go to alerts configuration
12+
To configure the _Alert manager_ click on your name in the top right corner of the navigation bar and choose
13+
Cluster Settings from the dropdown menu. In the Cluster Settings' Alerts tab you can configure the alert
14+
manager to send alerts via email, slack or pagerduty.
15+
16+
<figure>
17+
<a href="../../assets/images/alerts/configure-alerts.png">
18+
<img src="../../assets/images/alerts/configure-alerts.png" alt="Configure alerts"/>
19+
</a>
20+
<figcaption>Configure alerts</figcaption>
21+
</figure>
22+
23+
### Step 2: Configure Email Alerts
24+
To send alerts via email you need to configure an SMTP server. Click on the _Configure_
25+
button on the left side of the **email** row and fill out the form that pops up.
26+
27+
<figure>
28+
<a href="../../assets/images/alerts/smtp-config.png">
29+
<img src="../../assets/images/alerts/smtp-config.png" alt="Configure Email Alerts"/>
30+
</a>
31+
<figcaption>Configure Email Alerts</figcaption>
32+
</figure>
33+
34+
- _Default from_: the address used as sender in the alert email.
35+
- _SMTP smarthost_: the Simple Mail Transfer Protocol (SMTP) host through which emails are sent.
36+
- _Default hostname (optional)_: hostname to identify to the SMTP server.
37+
- _Authentication method_: how to authenticate to the SMTP server.
38+
CRAM-MD5, LOGIN or PLAIN.
39+
40+
Optionally cluster wide Email alert receivers can be added in _Default receiver emails_.
41+
These receivers will be available to all users when they create event triggered [alerts](../../user_guides/projects/jobs/alert).
42+
43+
### Step 3: Configure Slack Alerts
44+
Alerts can also be sent via Slack messages. To be able to send Slack messages you first need to configure
45+
a Slack webhook. Click on the _Configure_ button on the left side of the **slack** row and past in your
46+
[Slack webhook](https://api.slack.com/messaging/webhooks) in _Webhook_.
47+
48+
<figure>
49+
<a href="../../assets/images/alerts/slack-config.png">
50+
<img src="../../assets/images/alerts/slack-config.png" alt="Configure slack Alerts"/>
51+
</a>
52+
<figcaption>Configure slack Alerts</figcaption>
53+
</figure>
54+
55+
Optionally cluster wide Slack alert receivers can be added in _Slack channel/user_.
56+
These receivers will be available to all users when they create event triggered [alerts](../../user_guides/projects/jobs/alert).
57+
58+
### Step 4: Configure Pagerduty Alerts
59+
Pagerduty is another way you can send alerts from Hopsworks. Click on the _Configure_ button on the left side of
60+
the **pagerduty** row and fill out the form that pops up.
61+
62+
<figure>
63+
<a href="../../assets/images/alerts/pagerduty-config.png">
64+
<img src="../../assets/images/alerts/pagerduty-config.png" alt="Configure Pagerduty Alerts"/>
65+
</a>
66+
<figcaption>Configure Pagerduty Alerts</figcaption>
67+
</figure>
68+
69+
Fill in Pagerduty URL: the URL to send API requests to.
70+
71+
Optionally cluster wide Pagerduty alert receivers can be added in _Service key/Routing key_.
72+
By first choosing the PagerDuty integration type:
73+
74+
- _global event routing (routing_key)_: when using PagerDuty integration type `Events API v2`.
75+
- _service (service_key)_: when using PagerDuty integration type `Prometheus`.
76+
77+
Then adding the Service key/Routing key of the receiver(s). PagerDuty provides
78+
[documentation](https://www.pagerduty.com/docs/guides/prometheus-integration-guide/) on how to integrate with
79+
Prometheus' Alert manager.
80+
81+
82+
### Step 5: Advanced configuration
83+
If you are familiar with Prometheus' [Alert manager](https://prometheus.io/docs/alerting/latest/alertmanager/)
84+
you can also configure alerts by editing the _yaml/json_ file directly.
85+
86+
<figure>
87+
<a href="../../assets/images/alerts/advanced-config.png">
88+
<img src="../../assets/images/alerts/advanced-config.png" alt="Advanced configuration"/>
89+
</a>
90+
<figcaption>Advanced configuration</figcaption>
91+
</figure>
92+
93+
_Example:_ Adding the yaml snippet shown below in the global section of the alert manager configuration will
94+
have the same effect as creating the SMTP configuration as shown in [section 1](#1-email-alerts) above.
95+
96+
```yaml
97+
global:
98+
smtp_smarthost: smtp.gmail.com:587
99+
smtp_from: hopsworks@gmail.com
100+
smtp_auth_username: hopsworks@gmail.com
101+
smtp_auth_password: XXXXXXXXX
102+
smtp_auth_identity: hopsworks@gmail.com
103+
...
104+
```
105+
106+
To test the alerts by creating triggers from Jobs and Feature group validations see [Alerts](../../user_guides/projects/jobs/alert).
107+
108+
## Conclusion
109+
In this guide you learned how to configure alerts in Hopsworks.

docs/admin/auth.md

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Authentication Methods
2+
3+
## Introduction
4+
Hopsworks can be configured to use different type of authentication methods. In this guide we will look at the
5+
different authentication methods available in Hopsworks.
6+
7+
## Prerequisites
8+
Administrator account on a Hopsworks cluster.
9+
10+
### Step 1: Go to Authentication methods page
11+
12+
To configure Authentication methods click on your name in the top right corner of the navigation bar and choose
13+
**Cluster Settings** from the dropdown menu.
14+
15+
### Step 2: Configure Authentication methods
16+
In the **Cluster Settings** _Authentication_ tab you can configure how users authenticate.
17+
18+
1. **TOTP Two-factor Authentication**: can be _disabled_, _optional_ or _mandatory_. If set to mandatory all users are
19+
required to set up two-factor authentication when registering.
20+
21+
!!! note
22+
23+
If two-factor is set to _mandatory_ on a cluster with preexisting users all users will need to go through
24+
lost device recovery step to enable two-factor. So consider setting it to _optional_ first and allow users to
25+
enable it before setting it to mandatory.
26+
27+
2. **OAuth2**: if your organization already have an identity management system compatible with
28+
[OpenID Connect (OIDC)](https://openid.net/connect/) you can configure Hopsworks to use your identity provider
29+
by enabling **OAuth** as shown in the figure below. After enabling OAuth
30+
you can register your identity provider by clicking on **Add Identity Provider** button. See
31+
[Create client](../oauth2/create-client) for details.
32+
3. **LDAP/Kerberos**: if your organization is using LDAP or Kerberos to manage users and services you can configure
33+
Hopsworks to use it as the user management system. You can enable LDAP/Kerberos by clicking on the checkbox,
34+
as shown in the figure below, and choosing LDAP or Kerberos. For more information on how to configure LDAP and Kerberos see
35+
[Configure LDAP](../ldap/configure-ldap) and [Configure Kerberos](../ldap/configure-krb).
36+
37+
<figure>
38+
<a href="../../assets/images/admin/auth-config.png">
39+
<img src="../../assets/images/admin/auth-config.png" alt="Authentication config" />
40+
</a>
41+
<figcaption>Setup Authentication Methods</figcaption>
42+
</figure>
43+
44+
In the figure above we see a cluster with Two-factor authentication disabled, OAuth enabled with one registered
45+
identity provider and LDAP authentication enabled.
46+
47+
## Conclusion
48+
In this guide you learned how to configure authentication methods in Hopsworks.

docs/admin/ha-dr/dr.md

+139
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Disaster Recovery
2+
3+
## Backup
4+
The state of the Hopsworks cluster is divided into data and metadata and distributed across the different node groups. This section of the guide allows you to take a consistent backup between data in the offline and online feature store as well as the metadata.
5+
6+
The following services contain critical state that should be backed up:
7+
8+
* **RonDB**: as mentioned above, the RonDB is used by Hopsworks to store the cluster metadata as well as the data for the online feature store.
9+
* **HopsFS**: HopsFS stores the data for the batch feature store as well as checkpoints and logs for feature engineering applications.
10+
11+
Backing up service/application metrics and services/applications logs are out of the scope of this guide. By default metrics and logs are rotated after 7 days. Application logs are available on HopsFS when the application has finished and, as such, are backed up with the rest of HopsFS’ data.
12+
13+
Apache Kafka and OpenSearch are additional services maintaining state. The OpenSearch metadata can be reconstructed from the metadata stored on RonDB.
14+
15+
Apache Kafka is used in Hopsworks to store the in-flight data that is on its way to the online feature store. In the event of a total loss of the cluster, running jobs with inflight data will have to be replayed.
16+
17+
### Configuration Backup
18+
19+
Hopsworks adopts an Infrastructure-as-code philosophy, as such all the configuration files for the different Hopsworks services are generated during the deployment phase. Cluster-specific customizations should be centralized in the cluster definition used to deploy the cluster. As such the cluster definition should be backed up (e.g., by committing it to a git repository) to be able to recreate the same cluster in case it needs to be recreated.
20+
21+
### RonDB Backup
22+
23+
The RonDB backup is divided into two parts: user and privileges backup and data backup.
24+
25+
To take the backup of users and privileges you can run the following command from any of the nodes in the head node group. This command generates a SQL file containing all the user definitions for both the metadata services (Hopsworks, HopsFS, Metastore) as well as the user and permission grants for the online feature store. This command needs to be run as user ‘mysql’ or with sudo privileges.
26+
27+
```sh
28+
/srv/hops/mysql/bin/mysqlpump -S /srv/hops/mysql-cluster/mysql.sock --exclude-databases=% --exclude-users=root,mysql.sys,mysql.session,mysql.infoschema --users > users.sql
29+
```
30+
31+
The second step is to trigger the backup of the data. This can be achieved by running the following command as user ‘mysql’ on one of the nodes of the head node group.
32+
33+
```sh
34+
/srv/hops/mysql-cluster/ndb/scripts/mgm-client.sh -e "START BACKUP [replace_backup_id] SNAPSHOTEND WAIT COMPLETED"
35+
```
36+
37+
The backup ID is an integer greater or equal than 1. The script uses the following: `$(date +'%y%m%d%H%M')` instead of an integer as backup id to make it easier to identify backups over time.
38+
39+
The command instructs each RonDB datanode to backup the data it is responsible for. The backup will be located locally on each datanode under the following path:
40+
41+
```sh
42+
/srv/hops/mysql-cluster/ndb/backups/BACKUP - the directory name will be BACKUP-[backup_id]
43+
```
44+
45+
A more comprehensive backup script is available [here](https://github.com/logicalclocks/ndb-chef/blob/master/templates/default/native_ndb_backup.sh.erb) - The script includes the steps above as well as collecting all the partial RonDB backups on a single node. The script is a good starting point and can be adapted to ship the database backup outside the cluster.
46+
47+
### HopsFS Backup
48+
49+
HopsFS is a distributed file system based on Apache HDFS. HopsFS stores its metadata in RonDB, as such metadata backup has already been discussed in the section above. The data is stored in the form of blocks on the different data nodes.
50+
For availability reasons, the blocks are replicated across three different data nodes.
51+
52+
Within a node, the blocks are stored by default under the following directory, under the ownership of the ‘hdfs’ user:
53+
54+
```sh
55+
/srv/hopsworks-data/hops/hopsdata/hdfs/dn/
56+
```
57+
58+
To safely backup all the data, a copy of all the datanodes should be taken. As the data is replicated across the different nodes, excluding a set of nodes might result in data loss.
59+
60+
Additionally, as HopsFS blocks are files on the file system and the filesystem can be quite large, the backup is not transactional. Consistency is dictated by the metadata. Blocks being added during the copying process will not be visible when restoring as they are not part of the metadata backup taken prior to cloning the HopsFS blocks.
61+
62+
When the HopsFS data blocks are stored in a cloud block storage, for example, Amazon S3, then it is sufficient to only backup the metadata. The blob cloud storage service will ensure durability of the data blocks.
63+
64+
## Restore
65+
66+
As with the backup phase, the restore operation is broken down in different steps.
67+
68+
### Cluster deployment
69+
70+
The first step to redeploy the cluster is to redeploy the binaries and configuration. You should reuse the same cluster definition used to deploy the first (original) cluster. This will re-create the same cluster with the same configuration.
71+
72+
### RonDB restore
73+
74+
The deployment step above created a functioning empty cluster. To restore the cluster, the first step is to restore the metadata and online feature store data stored on RonDB.
75+
To restore the state of RonDB, we first need to restore its schemas and tables, then its data, rebuild the indices, and finally restore the users and grants.
76+
77+
#### Restore RonDB schemas and tables
78+
79+
This command should be executed on one of the nodes in the head node group and is going to recreate the schemas, tables, and internal RonDB metadata. In the command below, you should replace the node_id with the id of the node you are running the command on, backup_id with the id of the backup you want to restore. Finally, you should replace the mgm_node_ip with the address of the node where the RonDB management service is running.
80+
81+
```sh
82+
/srv/hops/mysql/bin/ndb_restore -n [node_id] -b [backup_id] -m --disable-indexes --ndb-connectstring=[mgm_node_ip]:1186 --backup_path=/srv/hops/mysql-cluster/ndb/backups/BACKUP/BACKUP-[backup_id]
83+
```
84+
85+
#### Restore RonDB data
86+
87+
This command should be executed on all the RonDB datanodes. Each command should be customized with the node id of the node you are trying to restore (i.e., replace the node_id). As for the command above you should replace the backup_id and mgm_node_ip.
88+
89+
```sh
90+
/srv/hops/mysql/bin/ndb_restore -n [node_id] -b [backup_id] -r --ndb-connectstring=[mgm_node_ip]:1186 --backup_path=/srv/hops/mysql-cluster/ndb/backups/BACKUP/BACKUP-[backup_id]
91+
```
92+
93+
#### Rebuild the indices
94+
95+
In the first command we disable the indices for recovery. This last command will take care of enabling them again. This command needs to run only once on one of the nodes of the head node group. As for the commands above, you should replace node_id, backup_id and mgm_node_id.
96+
97+
```sh
98+
/srv/hops/mysql/bin/ndb_restore -n [node_id] -b [backup_id] --rebuild-indexes --ndb-connectstring=[mgm_node_ip]:1186 --backup_path=/srv/hops/mysql-cluster/ndb/backups/BACKUP/BACKUP-[backup_ip]
99+
```
100+
101+
#### Restore Users and Grants
102+
103+
In the backup phase, we took the backup of the user and grants separately. The last step of the RonDB restore process is to re-create all the users and grants both for Hopsworks services as well as for the online feature store users. This can be achieved by running the following command on one node of the head node group:
104+
105+
```sh
106+
/srv/hops/mysql-cluster/ndb/scripts/mysql-client.sh source users.sql
107+
```
108+
109+
### HopsFS restore
110+
111+
With the metadata restored, you can now proceed to restore the file system blocks on HopsFS and restart the file system. When starting the datanode, it will advertise it’s ID/ClusterID and Storage ID based on the VERSION file that can be found in this directory:
112+
113+
```sh
114+
/srv/hopsworks-data/hops/hopsdata/hdfs/dn/current
115+
```
116+
117+
It’s important that all the datanodes are restored and they report their block to the namenodes processes running on the head nodes. By default the namenodes in HopsFS will exit “SAFE MODE” (i.e., the mode that allows only read operations) only when the datanodes have reported 99.9% of the blocks the namenodes have in the metadata. As such, the namenodes will not resume operations until all the file blocks have been restored.
118+
119+
### OpenSearch state rebuild
120+
121+
The OpenSearch state can be rebuilt using the Hopsworks metadata stored on RonDB. The rebuild process is done by using the re-indexing mechanism provided by ePipe.
122+
The re-indexing can be triggered by running the following command on the head node where ePipe is running:
123+
124+
```sh
125+
/srv/hops/epipe/bin/reindex-epipe.sh
126+
```
127+
128+
The script is deployed and configured during the platform deployment.
129+
130+
### Kafka topics rebuild
131+
132+
The backup and restore plan doesn’t cover the data in transit in Kafka, for which the jobs producing it will have to be replayed. However, the RonDB backup contains the information necessary to recreate the topics of all the feature groups.
133+
You can run the following command, as super user, to recreate all the topics with the correct partitioning and replication factors:
134+
135+
```sh
136+
/srv/hops/kafka/bin/kafka-restore.sh
137+
```
138+
139+
The script is deployed and configured during the platform deployment.

0 commit comments

Comments
 (0)