Use sqlite cli to backup core database #2151

lucyb · 2024-11-19T14:31:50Z

We recently noticed a problem with restoring the existing backup of the core database (#2091).

Rather than try to fix it, this is a proposal to replace the current backup with a sqlite CLI functionality instead.

We should also strongly consider making the backups into a different directory to the production sqlite database, for ease of management and to reduce potential for any errors.

See slack 🧵.

This blog post evaluates the options available for backing up a SQLite db.

`.dump` vs `.backup`

.dump was the original suggestion in this ticket. It's designed for data migration and similar to dumpdata, producing text output of SQL commands required to reconstruct the DB schema and contents. .backup is an alternative SQLite CLI command designed for backups that produces binary output.

.backup produces smaller, binary output that is less compressable but more-amenable to de-duplication if we need to move to incremental backups to save space (as described in the linked blog post).

.backup is effectively a file copy with the database locked and is faster and more scalable to both backup and restore than .dump.

.dump is portable to other database systems or versions, .backup is not.

Either has things to recommend them but I (Mike) have chosen to use .backup as backup is its intended use case, it is faster, and more optimizable for space if we need to optimize that. It's also more aligned with the backup approach envisaged in #2153 for the coding system databases.

Assumptions

We'll be using the official sqlite cli, so it's less likely to have problems than using manage.py dumpdata / loaddata. It's also likely to use less memory than the current mange.py loaddata command, which may be significant on our relatively small server.

The current approach doesn't backup some sensitive tables (namely the django session tables), to support its use in local development. I don't believe that will be an issue, as we are planning to move away from using production data (unless for a specified purpose) in local development and because in practice people aren't using this data anyway.

Acceptance criteria

Update the current cron job to use sqlite cli .backup -- Take backups via sqlite ".backup" #2214.
Update line 17 to keep only the last 30 days of the new backup files -- Take backups via sqlite ".backup" #2214.
Check that a local restore (NOT production) works as expected -- Take backups via sqlite ".backup" #2214.
Document the changes in an ADR. -- Document backup strategy in a draft ADR #2246
Update the developer docs to recommend that developers make a copy of the backup, if needed for local development, rather than the fixtures -- Take backups via sqlite ".backup" #2214.
Choose a zip program/algorithm appropriately balanced between compression and speed -- Make DB backup use Zstandard compression #2248.

Note:
There will be a separate issue for the coding systems databases -- #2153, #2225.

This ticket will fix #2091.

The text was updated successfully, but these errors were encountered:

mikerkelly · 2024-12-04T15:37:27Z

First, I'll manually create a backup, copy it to my local development environment, and restore it to ensure the backup mechanism works correctly. Having tried this and successfully completed this process, I believe transitioning to this backup strategy should be straightforward. I'll update backup.sh, make sure it does the right thing locally, then make a PR. app.json is already running that script once a day in prod.

Commands used for reproducibility. indentation representing increasingly nested shells.

ssh mikerkelly@dokku3.ebmdatalab.net

    dokku enter opencodelists

        mkdir -p /storage/backup/db
        sqlite3 db.sqlite3 ".backup /storage/backup/db/test_backup.sqlite"
        exit

    cp /var/lib/dokku/data/storage/opencodelists/backup/db/test_backup.sqlite .
    exit

scp mikerkelly@dokku3.ebmdatalab.net:~/test_backup.sqlite ~/opencodelists
cd ~/opencodelists
sqlite3 db.sqlite3 ".backup mike-ocl.sqlite3"
sqlite3 db.sqlite3 ".restore test_backup.sqlite"

This just works, seemingly, eg I get all of the codelists listed on the homepage. But if I click into one it fails due to lack of a coding system DB. Presumably if I copied and restored all of those too I'd have a fully working site. The linked issues discuss how coding systems are in separate DBs and might need a different backup strategy.

Issue #2117 highlights the occasional need to restore from production, though this requires care to promptly discard any sensitive data (e.g., deleting the database after debugging). It might be worth exploring ways to sanitize the production database during restoration—perhaps dropping the user and sessions tables would suffice? However, since codelists have links to the users that created them, we might need to replace any instead PII in the user table with placeholder data.

If the latest core database is available via a fixed symlink, we could potentially write a script to automate downloading and restoring it, alongside a management command to sanitize PII. This could be a follow-up ticket.

It takes about 20 seconds to make the backup, 5 minutes to transfer it to my local machine, and a few seconds to unzip and restore from it. gzip takes about 120s, gzip -9 takes about 435s but doesn't really improve compression (<1%). So hopefully this will also be faster than the existing job which seems to take about 20m, given that the database files have timestamps like 01:22 when the job is triggered at 0100.

It's about 4.1GB in size. gzip compression gets it down to 1.1GB. Current backups are about 400MB. So daily backups for 30 days would take ~33GB instead of 12GB space, about 2.75 time as much. There are 55GB currently available in the prod file system. We could try vacuuming the database or other approaches to reducing its size or incremental backups or other backup strategy. But for this ticket I want to keep things simple. See #2215.

mikerkelly · 2024-12-16T11:13:20Z

All acceptance criteria now have a PR linked, will close when all are closed (automatically?!).

lucyb changed the title ~~Use sqllite to backup core database~~ Use sqlite cli to backup core database Nov 19, 2024

lucyb mentioned this issue Nov 19, 2024

Backup coding system databases #2153

Closed

lucyb added the initiative:onboarding label Nov 19, 2024

lucyb mentioned this issue Nov 20, 2024

Agree a backup policy #2127

Closed

mikerkelly self-assigned this Dec 4, 2024

mikerkelly mentioned this issue Dec 4, 2024

Consider a more performant homepage #1640

Open

mikerkelly closed this as completed in #2248 Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sqlite cli to backup core database #2151

Use sqlite cli to backup core database #2151

lucyb commented Nov 19, 2024 •

edited by mikerkelly

Loading

mikerkelly commented Dec 4, 2024 •

edited

Loading

mikerkelly commented Dec 16, 2024

Use sqlite cli to backup core database #2151

Use sqlite cli to backup core database #2151

Comments

lucyb commented Nov 19, 2024 • edited by mikerkelly Loading

.dump vs .backup

Assumptions

Acceptance criteria

mikerkelly commented Dec 4, 2024 • edited Loading

mikerkelly commented Dec 16, 2024

lucyb commented Nov 19, 2024 •

edited by mikerkelly

Loading

`.dump` vs `.backup`

mikerkelly commented Dec 4, 2024 •

edited

Loading