Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sqlite cli to backup core database #2151

Closed
6 tasks done
lucyb opened this issue Nov 19, 2024 · 2 comments · Fixed by #2214, #2242, #2246 or #2248
Closed
6 tasks done

Use sqlite cli to backup core database #2151

lucyb opened this issue Nov 19, 2024 · 2 comments · Fixed by #2214, #2242, #2246 or #2248
Assignees

Comments

@lucyb
Copy link
Contributor

lucyb commented Nov 19, 2024

We recently noticed a problem with restoring the existing backup of the core database (#2091).

Rather than try to fix it, this is a proposal to replace the current backup with a sqlite CLI functionality instead.

We should also strongly consider making the backups into a different directory to the production sqlite database, for ease of management and to reduce potential for any errors.

See slack 🧵.

This blog post evaluates the options available for backing up a SQLite db.

.dump vs .backup

.dump was the original suggestion in this ticket. It's designed for data migration and similar to dumpdata, producing text output of SQL commands required to reconstruct the DB schema and contents. .backup is an alternative SQLite CLI command designed for backups that produces binary output.

.backup produces smaller, binary output that is less compressable but more-amenable to de-duplication if we need to move to incremental backups to save space (as described in the linked blog post).

.backup is effectively a file copy with the database locked and is faster and more scalable to both backup and restore than .dump.

.dump is portable to other database systems or versions, .backup is not.

Either has things to recommend them but I (Mike) have chosen to use .backup as backup is its intended use case, it is faster, and more optimizable for space if we need to optimize that. It's also more aligned with the backup approach envisaged in #2153 for the coding system databases.

Assumptions

We'll be using the official sqlite cli, so it's less likely to have problems than using manage.py dumpdata / loaddata. It's also likely to use less memory than the current mange.py loaddata command, which may be significant on our relatively small server.

The current approach doesn't backup some sensitive tables (namely the django session tables), to support its use in local development. I don't believe that will be an issue, as we are planning to move away from using production data (unless for a specified purpose) in local development and because in practice people aren't using this data anyway.

Acceptance criteria

Note:
There will be a separate issue for the coding systems databases -- #2153, #2225.

This ticket will fix #2091.

@lucyb lucyb changed the title Use sqllite to backup core database Use sqlite cli to backup core database Nov 19, 2024
@mikerkelly mikerkelly self-assigned this Dec 4, 2024
@mikerkelly
Copy link
Contributor

mikerkelly commented Dec 4, 2024

First, I'll manually create a backup, copy it to my local development environment, and restore it to ensure the backup mechanism works correctly. Having tried this and successfully completed this process, I believe transitioning to this backup strategy should be straightforward. I'll update backup.sh, make sure it does the right thing locally, then make a PR. app.json is already running that script once a day in prod.

Commands used for reproducibility. indentation representing increasingly nested shells.

ssh mikerkelly@dokku3.ebmdatalab.net

    dokku enter opencodelists

        mkdir -p /storage/backup/db
        sqlite3 db.sqlite3 ".backup /storage/backup/db/test_backup.sqlite"
        exit

    cp /var/lib/dokku/data/storage/opencodelists/backup/db/test_backup.sqlite .
    exit

scp mikerkelly@dokku3.ebmdatalab.net:~/test_backup.sqlite ~/opencodelists
cd ~/opencodelists
sqlite3 db.sqlite3 ".backup mike-ocl.sqlite3"
sqlite3 db.sqlite3 ".restore test_backup.sqlite"

This just works, seemingly, eg I get all of the codelists listed on the homepage. But if I click into one it fails due to lack of a coding system DB. Presumably if I copied and restored all of those too I'd have a fully working site. The linked issues discuss how coding systems are in separate DBs and might need a different backup strategy.

Issue #2117 highlights the occasional need to restore from production, though this requires care to promptly discard any sensitive data (e.g., deleting the database after debugging). It might be worth exploring ways to sanitize the production database during restoration—perhaps dropping the user and sessions tables would suffice? However, since codelists have links to the users that created them, we might need to replace any instead PII in the user table with placeholder data.

If the latest core database is available via a fixed symlink, we could potentially write a script to automate downloading and restoring it, alongside a management command to sanitize PII. This could be a follow-up ticket.

It takes about 20 seconds to make the backup, 5 minutes to transfer it to my local machine, and a few seconds to unzip and restore from it. gzip takes about 120s, gzip -9 takes about 435s but doesn't really improve compression (<1%). So hopefully this will also be faster than the existing job which seems to take about 20m, given that the database files have timestamps like 01:22 when the job is triggered at 0100.

It's about 4.1GB in size. gzip compression gets it down to 1.1GB. Current backups are about 400MB. So daily backups for 30 days would take ~33GB instead of 12GB space, about 2.75 time as much. There are 55GB currently available in the prod file system. We could try vacuuming the database or other approaches to reducing its size or incremental backups or other backup strategy. But for this ticket I want to keep things simple. See #2215.

@mikerkelly
Copy link
Contributor

All acceptance criteria now have a PR linked, will close when all are closed (automatically?!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment