Skip to content

Commit

Permalink
Make DB backup use Zstandard compression.
Browse files Browse the repository at this point in the history
Zstandard is a fast, modern, lossless data compression algorithm. For these
backup files, it gives marginally better compression ratios than `gzip` and much
faster compression and particularly decompression. We want the backup process
to be quick as it's a CPU-intensive activity that could affect site performance.

Experimental comparison of different compression utilities with their default settings:

```
Compression Test Results for 'latest-db.sqlite3':
Method | Compression Time | Original Size  | Compressed Size  | Compression Ratio
-------|------------------|----------------|------------------|------------------
gzip   | Time: 272.43 s   | Original: 4.1G | Compressed: 1.1G | Ratio: 73%
XZ     | Time: 3151.21s   | Original: 4.1G | Compressed: 606M | Ratio: 86%
7Z     | Time: 985.42 s   | Original: 4.1G | Compressed: 616M | Ratio: 86%
BZIP2  | Time: 462.39 s   | Original: 4.1G | Compressed: 840M | Ratio: 80%
ZSTD   | Time:  30.14 s   | Original: 4.1G | Compressed: 1.1G | Ratio: 75%
```

Zstandard is the clear best balance between speed and compression ratio. Any
attempt to increase the compression ratio further, including flags to zstd,
slowed the process significantly. If we were concerned about backup local space
usage we should consider using additional or external storage, incremental
backups, or a separate archival process.
  • Loading branch information
mikerkelly committed Dec 16, 2024
1 parent ef25815 commit 7984176
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 8 deletions.
11 changes: 5 additions & 6 deletions DEVELOPERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,18 +71,17 @@ then restore from the decompressed backup file. On the production server:

```sh
dokku enter opencodelists
sqlite3 /app/db.sqlite3 ".backup /storage/backup/previous-db.sqlite"
sqlite3 /storage/db.sqlite3 ".backup /storage/backup/previous-db.sqlite3"

cp /storage/backup/db/{PATH_TO_BACKUP_GZ} /storage/backup
gunzip /storage/backup/{PATH_TO_BACKUP_GZ}
sqlite3 /app/db.sqlite3 ".restore /storage/backup/{PATH_TO_BACKUP_SQLITE}
zstd -d /storage/backup/db/{PATH_TO_BACKUP_ZST} -o /storage/backup/restore-db.sqlite3
sqlite3 /storage/db.sqlite3 ".restore /storage/backup/restore-db.sqlite3
```
When all is confirmed working with the restore, you can delete
`previous-db.sqlite3`.
`previous-db.sqlite3` and `restore-db.sqlite3`.
The latest backup is available via symlink at
`/storage/backup/db/latest-db.sqlite3.gz`. You can use `scp`, `gunzip` and
`/storage/backup/db/latest-db.sqlite3.zst`. You can use `scp`, `zstd -d` and
`sqlite3 ".restore" to bring your local database into the same state as the
production database. You may also wish to retrieve the coding systems
databases, otherwise you will not be able to interact with codelists that
Expand Down
11 changes: 9 additions & 2 deletions deploy/bin/backup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,24 @@ BACKUP_FILEPATH="$BACKUP_DIR/$BACKUP_FILENAME"
sqlite3 "$DATABASE_DIR/db.sqlite3" ".backup $BACKUP_FILEPATH"

# Compress the latest backup.
gzip -f "$BACKUP_FILEPATH"
# Zstandard is a fast, modern, lossless data compression algorithm. It gives
# marginally better compression ratios than gzip on the backup and much faster
# compression and particularly decompression. We want the backup process to be
# quick as it's a CPU-intensive activity that could affect site performance.
# --rm flag removes the source file after compression.
zstd "$BACKUP_FILEPATH" --rm

# Symlink to the new latest backup to make it easy to discover.
# Make the target a relative path -- an absolute one won't mean the same thing
# in the host file system if executed inside a container as we expect.
ln -sf "$BACKUP_FILENAME.gz" "$BACKUP_DIR/latest-db.sqlite3.gz"
ln -sf "$BACKUP_FILENAME.zst" "$BACKUP_DIR/latest-db.sqlite3.zst"

# Keep only the last 30 days of backups.
# For now, apply this to both the original backup dir with backups based on the
# Django dumpdata management command and the new dir with backups based on
# sqlite .backup. Once there are none of the former remaining, the first line can be
# removed, along with most of this comment.
find "$DATABASE_DIR" -name "core-data-*.json.gz" -type f -mtime +30 -exec rm {} \;
# We initially compressed with gzip, this can be removed when none left.
find "$BACKUP_DIR" -name "*-db.sqlite3.gz" -type f -mtime +30 -exec rm {} \;
find "$BACKUP_DIR" -name "*-db.sqlite3.zst" -type f -mtime +30 -exec rm {} \;
2 changes: 2 additions & 0 deletions docker/dependencies.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ python3.12
python3.12-venv
sqlite3
tzdata
# Fast, modern compression utility. Compress backups. Search 'zstd' to find uses.
zstd

0 comments on commit 7984176

Please sign in to comment.