Skip to content

Commit 922fd0c

Browse files
authored
fix(prometheus): alert when a job actively fails, not when it is either running or failed (#108)
1 parent d52943b commit 922fd0c

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

etcd-backup-cronjob-monitor.PrometheusRule.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# This PrometheusRule alerts if a etcd-backup job has failed or was not scheduled.
44
#
55
# For detailed explanation on how it works, please see:
6-
# https://wiki.adfinis.com/adfinis/index.php/Red_Hat_OpenShift_Container_Platform/Backup_Restore/etcd-backup_4.7#Monitoring
6+
# https://wiki.adfinis.com/adfinis/index.php/Red_Hat_OpenShift_Container_Platform/Backup_Restore
77
#
88
# Apply with:
99
# oc apply -n etcd-backup -f etcd-backup-cronjob-monitor.PrometheusRule.yaml
@@ -19,6 +19,6 @@ spec:
1919
rules:
2020
- alert: EtcdBackupCronJobStatusFailed
2121
expr: |
22-
kube_job_status_succeeded{namespace="etcd-backup"} == 0
22+
kube_job_status_failed{namespace="etcd-backup"} > 0
2323
labels:
2424
severity: critical

0 commit comments

Comments
 (0)