Skip to content

Commit a63cfec

Browse files
authored
Update instructions for exporting Prometheus metrics (#446) (#456)
1 parent e2217c3 commit a63cfec

File tree

1 file changed

+83
-39
lines changed

1 file changed

+83
-39
lines changed

docs/setup_installation/admin/monitoring/export-metrics.md

+83-39
Original file line numberDiff line numberDiff line change
@@ -2,78 +2,122 @@
22

33
## Introduction
44
Hopsworks services produce metrics which are centrally gathered by [Prometheus](https://prometheus.io/) and visualized in [Grafana](../grafana).
5-
Although the system is self-contained, it is possible to export these metrics to third-party services or another Prometheus instance.
5+
Although the system is self-contained, it is possible for another *federated* Prometheus instance to scrape these metrics or directly push them to another system.
66
This is useful if you have a centralized monitoring system with already configured alerts.
77

88
## Prerequisites
9-
In order to configure Prometheus to export metrics you need `root` SSH access to either Hopsworks or to the target server depending on the export method you choose below.
9+
In order to configure Prometheus to export metrics you need to have the right to change the remote Prometheus configuration.
1010

1111
## Exporting metrics
1212
Prometheus can be configured to export metrics to another Prometheus instance (cross-service federation) or to a custom service which knows how to handle them.
1313

1414
### Prometheus federation
15-
Prometheus servers can be federated to scale better or to just clone all metrics (cross-service federation). Prometheus federation is well [documented](https://prometheus.io/docs/prometheus/latest/federation/#cross-service-federation)
16-
but there are some specificities to Hopsworks.
15+
Prometheus servers can be federated to scale better or to just clone all metrics (cross-service federation).
1716

1817
In the guide below we assume **Prometheus A** is the service running in Hopsworks and **Prometheus B** is the server you want to clone metrics to.
1918

2019
#### Step 1
21-
**Prometheus B** needs to be able to connect to TCP port `9089` of **Prometheus B** to scrape metrics. If you have any firewall (or Security Group) in place, allow ingress for that port.
20+
**Prometheus B** needs to be able to connect to TCP port `9090` of **Prometheus A** to scrape metrics. If you have any firewall (or Security Group) in place, allow ingress for that port.
2221

2322
#### Step 2
24-
SSH into **Prometheus B** server, edit Prometheus configuration file and add the following under the `scrape_configs`
23+
The next step is to expose **Prometheus A** running inside Hopsworks Kubernetes cluster. If **Prometheus B** has direct access to **Prometheus A** then you can skip this step.
24+
25+
We will create a Kubernetes *Service* of type *LoadBalancer* to expose port `9090`
26+
27+
!!!Warning
28+
If you need to apply custom **annotations**, then modify the Manifest below
29+
The example below assumes Hopsworks is **installed** at Namespace *hopsworks*
30+
31+
```bash
32+
kubectl apply -f - <<EOF
33+
apiVersion: v1
34+
kind: Service
35+
metadata:
36+
name: prometheus-external
37+
namespace: hopsworks
38+
labels:
39+
app: prometheus
40+
spec:
41+
type: LoadBalancer
42+
selector:
43+
app.kubernetes.io/name: prometheus
44+
app.kubernetes.io/component: server
45+
ports:
46+
- protocol: TCP
47+
port: 9090
48+
targetPort: 9090
49+
EOF
50+
```
51+
52+
Then we need to find the External IP address of the newly created Service
53+
54+
```bash
55+
export NAMESPACE=hopsworks
56+
kubectl -n $NAMESPACE get svc prometheus-external -ojsonpath='{.status.loadBalancer.ingress[0].ip}'
57+
```
58+
59+
!!!Warning
60+
It will take a few seconds until an IP address is assigned to the Service
61+
62+
We will use this IP address in Step 2
63+
64+
#### Step 2
65+
Edit the configuration file of **Prometheus B** server and append the following Job under `scrape_configs`
2566

2667
!!! note
27-
Replace IP_ADDRESS with the actual address of Hopsworks server
68+
Replace IP_ADDRESS with the IP address from Step 1 or the IP address of Prometheus service if it is directly accessible.
69+
The snippet below assumes Hopsworks services runs at Namespace **hopsworks**
2870

2971
```yaml
3072
- job_name: 'federate'
31-
scrape_interval: 15s
73+
scrape_interval: 15s
3274

33-
honor_labels: true
34-
metrics_path: '/federate'
75+
honor_labels: true
76+
metrics_path: '/federate'
3577

36-
params:
37-
'match[]':
38-
- '{job="airflow"}'
39-
- '{job="pushgateway"}'
40-
- '{job="hadoop"}'
41-
- '{job="hopsworks"}'
78+
params:
79+
'match[]':
80+
- '{namespace="hopsworks"}'
4281

43-
static_configs:
44-
- targets:
45-
- 'IP_ADDRESS:9089'
82+
static_configs:
83+
- targets:
84+
- 'IP_ADDRESS:9090'
4685
```
4786
48-
These are the basic labels gathered by Hopsworks.
87+
The configuration above will scrape for services metrics under the *hopsworks* Namespace. If you want to additionally
88+
scrape *user application* metrics then append `'{job="pushgateway"}'` to the matchers, for example:
4989
50-
* If your Hopsworks cluster runs **without** Kubernetes append `'{job="cadvisor"}'` to `match[]` list
51-
52-
* If your Hopsworks cluster runs **with** Kubernetes append the following labels to `match[]`
53-
* `'{job=~"knative.+"}'`
54-
* `'{job="kubernetes-cadvisor"}'`
55-
* `'{job="istio-envoy"}'`
56-
* `'{job="kube-state-metrics"}'`
57-
* `'{job="cadvisor"}'`
58-
* `'{job="cadvisor"}'`
59-
* `'{job="cadvisor"}'`
90+
```yaml
91+
params:
92+
'match[]':
93+
- '{namespace="hopsworks"}'
94+
- '{job="pushgateway"}'
95+
```
6096
61-
#### Step 3
62-
Finally restart Prometheus service with `sudo systemctl restart prometheus`
97+
Depending on the Prometheus setup you might need to restart **Prometheus B** service to pick up the new configuration.
98+
For more details on federation visit Prometheus [documentation](https://prometheus.io/docs/prometheus/latest/federation/#cross-service-federation)
6399
64100
### Custom service
65101
Prometheus can push metrics to another custom resource via HTTP. The custom service is responsible for handling the received metrics.
66102
To push metrics with this method we use the `remote_write` configuration.
67103

68-
69104
We will only give a sample configuration as `remote_write` is extensively documented in Prometheus [documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write)
70105
In the example below we push metrics to a custom service listening on port 9096 which transforms the metrics and forwards them.
71106

107+
In order to configure Prometheus to push metrics to a remote HTTP service we need to customize our Helm chart values file with the following snippet after changing the *url* accordingly. You can also tweak other configuration parameters to your needs.
108+
72109
```yaml
73-
remote_write:
74-
- url: "http://localhost:9096"
75-
queue_config:
76-
capacity: 10000
77-
max_samples_per_send: 5000
78-
batch_send_deadline: 60s
110+
prometheus:
111+
prometheus:
112+
server:
113+
remoteWrite:
114+
- url: "http://localhost:9096"
115+
queue_config:
116+
capacity: 10000
117+
max_samples_per_send: 5000
118+
batch_send_deadline: 60s
79119
```
120+
121+
If the section already exists, then append the `remoteWrite` section.
122+
123+
Run `helm install` or `helm upgrade` if it's the first time you install Hopsworks or you want to apply the change to an existing cluster respectively.

0 commit comments

Comments
 (0)