-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sqlite cli to backup core database #2151
Comments
First, I'll manually create a backup, copy it to my local development environment, and restore it to ensure the backup mechanism works correctly. Having tried this and successfully completed this process, I believe transitioning to this backup strategy should be straightforward. I'll update Commands used for reproducibility. indentation representing increasingly nested shells. ssh mikerkelly@dokku3.ebmdatalab.net
dokku enter opencodelists
mkdir -p /storage/backup/db
sqlite3 db.sqlite3 ".backup /storage/backup/db/test_backup.sqlite"
exit
cp /var/lib/dokku/data/storage/opencodelists/backup/db/test_backup.sqlite .
exit
scp mikerkelly@dokku3.ebmdatalab.net:~/test_backup.sqlite ~/opencodelists
cd ~/opencodelists
sqlite3 db.sqlite3 ".backup mike-ocl.sqlite3"
sqlite3 db.sqlite3 ".restore test_backup.sqlite" This just works, seemingly, eg I get all of the codelists listed on the homepage. But if I click into one it fails due to lack of a coding system DB. Presumably if I copied and restored all of those too I'd have a fully working site. The linked issues discuss how coding systems are in separate DBs and might need a different backup strategy. Issue #2117 highlights the occasional need to restore from production, though this requires care to promptly discard any sensitive data (e.g., deleting the database after debugging). It might be worth exploring ways to sanitize the production database during restoration—perhaps dropping the user and sessions tables would suffice? However, since codelists have links to the users that created them, we might need to replace any instead PII in the user table with placeholder data. If the latest core database is available via a fixed symlink, we could potentially write a script to automate downloading and restoring it, alongside a management command to sanitize PII. This could be a follow-up ticket. It takes about 20 seconds to make the backup, 5 minutes to transfer it to my local machine, and a few seconds to unzip and restore from it. gzip takes about 120s, gzip -9 takes about 435s but doesn't really improve compression (<1%). So hopefully this will also be faster than the existing job which seems to take about 20m, given that the database files have timestamps like 01:22 when the job is triggered at 0100. It's about 4.1GB in size. gzip compression gets it down to 1.1GB. Current backups are about 400MB. So daily backups for 30 days would take ~33GB instead of 12GB space, about 2.75 time as much. There are 55GB currently available in the prod file system. We could try vacuuming the database or other approaches to reducing its size or incremental backups or other backup strategy. But for this ticket I want to keep things simple. See #2215. |
All acceptance criteria now have a PR linked, will close when all are closed (automatically?!). |
We recently noticed a problem with restoring the existing backup of the core database (#2091).
Rather than try to fix it, this is a proposal to replace the current backup with a sqlite CLI functionality instead.
We should also strongly consider making the backups into a different directory to the production sqlite database, for ease of management and to reduce potential for any errors.
See slack 🧵.
This blog post evaluates the options available for backing up a SQLite db.
.dump
vs.backup
.dump
was the original suggestion in this ticket. It's designed for data migration and similar todumpdata
, producing text output of SQL commands required to reconstruct the DB schema and contents..backup
is an alternative SQLite CLI command designed for backups that produces binary output..backup
produces smaller, binary output that is less compressable but more-amenable to de-duplication if we need to move to incremental backups to save space (as described in the linked blog post)..backup
is effectively a file copy with the database locked and is faster and more scalable to both backup and restore than.dump
..dump
is portable to other database systems or versions,.backup
is not.Either has things to recommend them but I (Mike) have chosen to use
.backup
as backup is its intended use case, it is faster, and more optimizable for space if we need to optimize that. It's also more aligned with the backup approach envisaged in #2153 for the coding system databases.Assumptions
We'll be using the official sqlite cli, so it's less likely to have problems than using
manage.py dumpdata
/loaddata
. It's also likely to use less memory than the currentmange.py loaddata
command, which may be significant on our relatively small server.The current approach doesn't backup some sensitive tables (namely the django session tables), to support its use in local development. I don't believe that will be an issue, as we are planning to move away from using production data (unless for a specified purpose) in local development and because in practice people aren't using this data anyway.
Acceptance criteria
.backup
-- Take backups viasqlite ".backup"
#2214.sqlite ".backup"
#2214.sqlite ".backup"
#2214.sqlite ".backup"
#2214.Note:
There will be a separate issue for the coding systems databases -- #2153, #2225.
This ticket will fix #2091.
The text was updated successfully, but these errors were encountered: