- Kube-prometheus stack for local prometheus, grafana, etc.
- Loki for local log storage and search
- Alloy for scraping, processing, and exporting observability events
- Install and configure helm and helmfile (including configuring kubectl context for your cluster).
- If using local storage for persistence, set up a storage class on your cluster that can handle dynamic persistent volumes. We use Rancher's local-path-provisioner by default.
- If using affinity rules, label nodes for scheduling monitoring pods. Set the values for affinity nodeLabelKey and nodeLabelValue for each helm chart appropriately. Example:
kubectl label node/your-node 'aistore.nvidia.com/role_monitoring=true'
.
- For a locally hosted prometheus stack including grafana, start by following the instructions in the
kube-prom
directory. - For locally hosted log storage, follow the instructions in the
loki
directory. - Finally, follow the instructions in
alloy
to deploy Grafana Alloy for scraping, processing, and forwarding both metrics and logs from various sources.
To use sensitive variables in your deployment, provide a *.env
file and load it when running your helmfile commands.
Example template:
set -a; . ../your-env.env ; set +a; helmfile -e prod template
Example sync:
set -a; . ../your-env.env ; set +a; helmfile -e prod sync
Here are the currently referenced optional environment variables
GRAFANA_PASSWORD
MIMIR_LABEL
MIMIR_ENDPOINT
LOKI_LABEL
LOKI_ENDPOINT
The web services for Prometheus and Grafana are not directly accessible from outside the cluster.
Options include changing the service types to NodePort
or using port-forwarding.
Default service names and ports:
Tool | Service Name | Default Port |
---|---|---|
Prometheus | prometheus-kube-prometheus-prometheus | 9090 |
Grafana | prometheus-grafana | 80 |
Loki Gateway | loki-gateway | 80 |
Alloy | alloy | 12345 |
Configure access from the host into the pod by using ONE of the following:
- Port-forward:
kubectl port-forward --namespace monitoring service/kube-prometheus-stack-grafana 3000:80
- Patch the service to use NodePort:
kubectl patch svc kube-prometheus-stack-grafana -n monitoring -p '{"spec": {"type": "NodePort"}}'
- Create a separate NodePort or LoadBalancer service: k8s docs
If needed, use an ssh tunnel to access the k8s host: ssh -L <port>:localhost:<port> <user-name>@<ip-or-host-name>
and view localhost:<port>
.
For Grafana, login with the admin user and the password set with the GRAFANA_PASSWORD
environment variable
Example output: