Skip to content

Commit

Permalink
Merge pull request #151 from DFE-Digital/939-bug-teaching-vacancies-d…
Browse files Browse the repository at this point in the history
…b-100-cpu

[939] Document enhanced postgres monitoring
  • Loading branch information
saliceti authored Feb 2, 2024
2 parents 1cfd963 + 6826b43 commit d097e55
Showing 1 changed file with 34 additions and 4 deletions.
38 changes: 34 additions & 4 deletions documentation/postgres-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,23 @@ Note that make commands and directories may be slightly different for your servi
This example changes the postgres admin password for the development env of the service

1. Set terraform env
```
```shell
$ make development terraform-init
```

2. Get the password resource (chdir directory may be different for your service)
```
```shell
$ terraform -chdir=terraform/aks state list |grep random_password
module.postgres.random_password.password[0]
```

3. Taint the password resource so it will be regenerated on next terraform apply
```
```shell
$ terraform -chdir=terraform/aks taint module.postgres.random_password.password[0]
```

4. Terraform plan should show that the password will be recreated on next run, alongside updates to application secrets and app deployments (due to a change to the DATABASE_URL).
```
```shell
$ make development terraform-plan

...
Expand All @@ -43,3 +43,33 @@ $ make development terraform-plan
}
...
```

## Monitor performance
When [monitoring](https://github.com/DFE-Digital/terraform-modules/blob/6278cbb72bfcf614e6f1572f5f5380a3543f5924/aks/postgres/variables.tf#L115) is enabled, metrics and logs are available in the *Monitoring* section of the pogres server portal page.

Active queries are listed in the `pg_stat_activity` table and it should be cheked first. Use [konduit](https://github.com/DFE-Digital/teacher-services-cloud/blob/main/scripts/konduit.sh) to connect.

Azure offers more tools for helping analysing the database performance. They can be useful in case of slowness or high resource usage.

Take note of the customisations and remove them when they're not needed anymore. Also, if you run terraform, it may discard all the manual changes.

### Enable server parameters
- [metrics.collector_database_activity](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-monitoring#enabling-enhanced-metrics): capture enhanced metrics related to Activity, Database, Logical replication, Replication, Saturation, Traffic
- [metrics.autovacuum_diagnostics](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-monitoring#autovacuum-metrics): capture metrics related to the [postgres autovacuum process](https://www.postgresql.org/docs/14/routine-vacuuming.html)
- [pg_qs.query_capture_mode](https://learn.microsoft.com/en-us/azure/postgresql/single-server/concepts-query-store-best-practices#set-the-optimal-query-capture-mode): Set to *All* (performance impact) or *Top* to analyse queries in the query store
- [pgms_wait_sampling.query_capture_mode](https://learn.microsoft.com/en-us/azure/postgresql/single-server/concepts-query-store-best-practices#set-the-optimal-query-capture-mode): set to *all* to capture wait statistics in the query store
- [track_io_timing](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-high-io-utilization#the-pg_stat_statements-extension): IO metrics, required for troubleshooting

### Send metrics to Log analytics
By default only logs are stored in the Log analytics workspace. Storing metrics is required for troubleshooting.

1. Navigate to Monitoring > Diagnostic settings
1. Edit the existing setting
1. Tick `AllMetrics` and click `Save`

### Analyse
After 15-30min, these tools can now be used:
- Intelligent Performance > Query performance impact
- Help > Troubleshooting guides
- Monitoring > Workbooks
- The query store is available in the `azure_sys` database on the same server

0 comments on commit d097e55

Please sign in to comment.