From cdbacb67c62fe6501c6c5164eda62937c71373a1 Mon Sep 17 00:00:00 2001 From: Mike Kelly Date: Mon, 9 Dec 2024 13:57:26 +0000 Subject: [PATCH] Take backups via `sqlite ".backup"`. Taking backups via the sqlite CLI is simple and clean. We can restore from them. We haven't been able to locally restore from the `dumpdata` fixtures-based backups. For now, keep the old-format backups just in case we want to try to restore again to from them. After 30 days of creating new backups we can update the script to not consider the old format and location. Start taking backups in `/storage/backup` rather than directly in `/storage` to aid navigation and management of backups. Set -x in `backup.sh` to print the commands used as this aids remote debugging. --- DEPLOY.md | 2 +- DEVELOPERS.md | 46 +++++++++++++++++++++++++++++++++----------- deploy/bin/backup.sh | 37 ++++++++++++++++++++++++----------- 3 files changed, 62 insertions(+), 23 deletions(-) diff --git a/DEPLOY.md b/DEPLOY.md index 40de6f6a..e317b29a 100644 --- a/DEPLOY.md +++ b/DEPLOY.md @@ -66,7 +66,7 @@ Check cron tasks: dokku$ dokku cron:list opencodelists ``` -Backups are saved to `/var/lib/dokku/data/storage/opencodelists` on dokku3. +Backups are saved to `/var/lib/dokku/data/storage/opencodelists/backup` on dokku3. ### Manually deploying diff --git a/DEVELOPERS.md b/DEVELOPERS.md index de48a755..df7b616d 100644 --- a/DEVELOPERS.md +++ b/DEVELOPERS.md @@ -46,23 +46,47 @@ A place to put scripts to be run via [runscript](https://django-extensions.readt ## Production database and backups -The production database and backups are located at `/var/lib/dokku/data/storage/opencodelists` on dokku3 (see also [deployment notes](DEPLOY.md)). -This database is the core (default) database; -the coding system databases are located within the `coding_systems` subdirectory. +Production data is stored on dokku3 at `/storage/` within the container layer +file system. This maps to `/var/lib/dokku/data/storage/opencodelists` in the +host operating system's file system. See also [deployment notes](DEPLOY.md)). -The backups are created with the dumpdata management command (`deploy/bin/backup.sh`). -They can be restored with: +`/storage/db.sqlite3` is the core Django database. -```sh -mv db.sqlite3 previous-db.sqlite3 +`/storage/coding_systems` contains the coding system databases. These are +read-only. Refer to their README files for information on the source data and +creation process. + +The core database is fully backed up daily on the local file system. Coding +system databases are not backed up locally but can be recreated from source. +Weekly backups of the droplets allow a restore of the file system. + +The core database backups are located at `/storage/backup/db`. They are created +by `deploy/bin/backup.sh` scheduled via `cron` as configured in `app.json`. +Backups are taken via the `sqlite3` `.backup` command . These are effectively +copies of the database file. They are compressed to save space. -python manage.py migrate +To restore from a backup, use the command-line tool to create a fresh temporary +backup of the current state of the database (in case anything gones wrong), +then restore from the decompressed backup file. On the production server: -python manage.py loaddata core-data-.json +```sh +dokku enter opencodelists +sqlite3 /app/db.sqlite3 ".backup /storage/backup/previous-db.sqlite" + +cp /storage/backup/db/{PATH_TO_BACKUP_GZ} /storage/backup +gunzip /storage/backup/{PATH_TO_BACKUP_GZ} +sqlite3 /app/db.sqlite3 ".restore /storage/backup/{PATH_TO_BACKUP_SQLITE} ``` -When all is confirmed working with the restore, -you can delete `previous-db.sqlite3`. +When all is confirmed working with the restore, you can delete +`previous-db.sqlite3`. + +The latest backup is available via symlink at +`/storage/backup/db/latest-db.sqlite3.gz`. You can use `scp`, `gunzip` and +`sqlite3 ".restore" to bring your local database into the same state as the +production database. You may also wish to retrieve the coding systems +databases, otherwise you will not be able to interact with codelists that +require them. ## Local development diff --git a/deploy/bin/backup.sh b/deploy/bin/backup.sh index 4f2a5b16..f5058e81 100755 --- a/deploy/bin/backup.sh +++ b/deploy/bin/backup.sh @@ -1,17 +1,32 @@ #!/bin/bash -set -euo pipefail +set -euxo pipefail +# We are changing the backup format and where they are stored. We want to +# retain 30 days of backups across both locations and formats. Once there +# are none of the old format remaining, this can be updated to just refer +# to the new location. REPO_ROOT="/app" -BACKUP_DIR="/storage" +ORIGINAL_BACKUP_DIR="/storage" +BACKUP_DIR="$ORIGINAL_BACKUP_DIR/backup/db" -python \ -"$REPO_ROOT"/manage.py \ -dumpdata \ -builder codelists opencodelists \ ---indent 2 \ ---verbosity 0 \ ---output "${BACKUP_DIR}/core-data-$(date +%F).json.gz" +# Make the backup dir if it doesn't exist. +mkdir "$BACKUP_DIR" -p -# Keep only the last 30 backups -find "$BACKUP_DIR" -name "core-data-*.json.gz" | sort | head -n -30 | xargs rm +# Take a datestamped backup. +BACKUP_FILENAME="$BACKUP_DIR/$(date +%F)-db.sqlite3" +sqlite3 "$REPO_ROOT/db.sqlite3" ".backup $BACKUP_FILENAME" + +# Compress the latest backup. +gzip -f "$BACKUP_FILENAME" + +# Symlink to the new latest backup to make it easy to discover. +ln -sf "$BACKUP_FILENAME.gz" "$BACKUP_DIR/latest-db.sqlite3.gz" + +# Keep only the last 30 days of backups. +# For now, apply this to both the original backup dir with backups based on the +# Django dumpdata management command and the new dir with backups based on +# sqlite .backup. Once there are none of the former remaining, the first line can be +# removed, along with most of this comment. +find "$ORIGINAL_BACKUP_DIR" -name "core-data-*.json.gz" -type f -mtime +30 -exec rm {} \; +find "$BACKUP_DIR" -name "*-db.sqlite3.gz" -type f -mtime +30 -exec rm {} \;