Skip to content

Commit

Permalink
Update README & CHANGELOG for quorum commit
Browse files Browse the repository at this point in the history
  • Loading branch information
blogh committed Dec 24, 2024
1 parent b27d898 commit 31cfd89
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 17 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## unreleased

### Added

* Support for quorum synchronous réplication (#80)

### Fixed

* Update the documentation to clarify that if patroni cannot be reached, we consider
Expand Down
51 changes: 34 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,14 +212,15 @@ Options:
```
Usage: check_patroni cluster_has_replica [OPTIONS]
Check if the cluster has healthy replicas and/or if some are sync standbies
Check if the cluster has healthy replicas and/or if some are sync or quorum
standbies
For patroni (and this check):
* a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
* a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
A healthy replica:
* has a `replica` or `sync_standby` role
* has a `replica`, `quorum_standby` or `sync_standby` role
* has the same timeline as the leader and
* is in `running` state (patroni < V3.0.4)
* is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
Expand All @@ -236,27 +237,41 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
switchover or failover and the standbies are in the process of catching up
with the new leader. The alert shouldn't last long.
In PostgreSQL, synchronous replication has two modes: on and quorum and is
configured with the gucs `synchronous_standby_names` and
`synchronous_commit`. Patroni uses the parameter `synchronous_mode`, which
can be set to `on`, `quorum` and `off`, and has `synchronous_node_count` to
configure the synchronous replication factor. Please note that, in
synchronous replication, the number of servers tagged as
"{sync|quorum}_standby" (what we measure) is not always equal tot
`synchronous_node_count`.
Check:
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
and if the sync_replica count is compatible with the sync replica count threshold.
and if the synchronous replica count is compatible with the sync replica count threshold.
* `WARNING` / `CRITICAL`: otherwise
Perfdata:
* healthy_replica & unhealthy_replica count
* the number of sync_replica, they are included in the previous count
* the number of sync_replica (sync or quorum depending on `--sync-type`), they are included
in the previous count
* the lag of each replica labelled with "member name"_lag
* the timeline of each replica labelled with "member name"_timeline
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
Options:
-w, --warning TEXT Warning threshold for the number of healthy replica
nodes.
-c, --critical TEXT Critical threshold for the number of healthy replica
nodes.
--sync-warning TEXT Warning threshold for the number of sync replica.
--sync-critical TEXT Critical threshold for the number of sync replica.
--max-lag TEXT maximum allowed lag
--help Show this message and exit.
-w, --warning TEXT Warning threshold for the number of healthy
replica nodes.
-c, --critical TEXT Critical threshold for the number of healthy
replica nodes.
--sync-warning TEXT Warning threshold for the number of sync
replica.
--sync-critical TEXT Critical threshold for the number of sync
replica.
--sync-type [any|sync|quorum] Synchronous replication mode used to filter
and count sync standbies. [default: any]
--max-lag TEXT maximum allowed lag
--help Show this message and exit.
```

### cluster_has_scheduled_action
Expand Down Expand Up @@ -310,6 +325,7 @@ Usage: check_patroni cluster_node_count [OPTIONS]
* replica
* standby_leader
* sync_standby
* quorum_standby
* demoted
* promoted
* uninitialized
Expand All @@ -327,7 +343,7 @@ Usage: check_patroni cluster_node_count [OPTIONS]
The "healthy" checks only ensures that:
* a leader has the running state
* a standby_leader has the running or streaming (V3.0.4) state
* a replica or sync-standby has the running or streaming (V3.0.4) state
* a replica, quorum_standby or sync_standby has the running or streaming (V3.0.4) state
Since we dont check the lag or timeline, "in archive recovery" is not
considered a valid state for this service. See cluster_has_leader and
Expand Down Expand Up @@ -468,10 +484,11 @@ Usage: check_patroni node_is_replica [OPTIONS]
noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
Options:
--max-lag TEXT maximum allowed lag
--is-sync check if the replica is synchronous
--is-async check if the replica is asynchronous
--help Show this message and exit.
--max-lag TEXT maximum allowed lag
--is-sync check if the replica is synchronous
--sync-type [any|sync|quorum] Synchronous replication mode. [default: any]
--is-async check if the replica is asynchronous
--help Show this message and exit.
```

### node_patroni_version
Expand Down

0 comments on commit 31cfd89

Please sign in to comment.