Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sqlite configuration options #2344

Merged
merged 2 commits into from
Feb 14, 2025

Conversation

mikerkelly
Copy link
Contributor

@mikerkelly mikerkelly commented Feb 12, 2025

Fixes #2268.

We think immediate mode in particular might help resolve 'database is locked' errors.

Options used here are largely following suggestions from the article https://kerkour.com/sqlite-for-servers as referenced in #2268/#2251, some of which I found Django applies by default. See commit messages for more discussion. In particular, I think we should not be using an excessively large timeout as this can cause users to wait a long time if there was a problem with the site. Let's try 5 seconds and see if the other settings changes resolve the issue, then gradually increase the value if there are still many errors.

I discovered that unfortunately we have been configuring the timeout settings for the default database incorrectly, so #1143, #1811, #2276 increasing it from 5 to 30, 60, and most recently 90 seconds had no effect. This might explain why the "database is locked" errors that they sought to remedy never improved.

DATABASES = {
    "default": dj_database_url.parse(
        os.environ.get("DATABASE_URL", default="sqlite:///db.sqlite3")
    ),
    "OPTIONS": {"timeout": 90},
}

The OPTIONS parameter is a key of DATABASES, not DATABASES["default"]. Therefore it doesn't apply to any database and does not take effect. This can be verified by checking PRAGMA busy_timeout, for example in a management command (show_db_timeout.txt). By default it's 5 seconds. Correct the structure of DATABASES and the changed setting is respected.

Added tests of the settings. I wondered if this is just testing django, but we've seen a minor error in configuring DATABASES can lead to unintended results, so let's test. Also to demonstrate that defaults do indeed take effect, and that the options work are applied to the DB / transactions as we expect.

I've attached two more throwaway management commands. One (
upgraded_transaction_fail.txt) I used to satisfy myself that changing to immediate mode fixes the "database is locked" issue when a transaction start as a read and upgrades to a write while another holds the lock. And another (profile_db.txt) to try bulk writes (I think this can be used to demonstrate that transactions that start by writing are affected by the timeout regardless of transaction mode setting). I don't think the reviewer need to look at these, it's in case they're useful for debugging etc. in future. All designed to work with class TestTable(models.Model): value = models.CharField(max_length=255) in opencodelists/models.py

Approach to testing inspired by opensafely-core/airlock#248. I noticed some of the settings they apply there from the article don't need to be set explicitly in Django as it applies them automatically. But maybe they're setting them for future-proofing or to be explicit? I might mention that to team RAP.

The `OPTIONS` parameter was a key of `DATABASES`, not `DATABASES["default"]`.
Therefore it didn't apply to any database or take effect.
This can be verified by checking `PRAGMA busy_timeout`, for example in a
management command, as at the bottom of this commit.

This commit fixes that by moving `OPTIONS` into the `default` dict, and using
** keyword unpacking on the `database_url.parse result` to get the other
settings. Now the changed setting is respected, defaulting to 5 seconds as set
by Django (I can't find in the source or docs that SQLite itself applies any
timeout by default, but Django sets a timeout with the PRAGMA, as the
test/command show).

This means that unfortunately #1143, #1811, #2276 had no effect on the timeout.
This might explain why the "database is locked" errors that they sought to
remedy never improved.

This commit does not change the setting from the Django default (for now), so
as to avoid changing the behaviour from what it was before. But this sets up
future commits to change the `OPTIONS` correctly.

Added a test of the setting. I wondered if this is just testing `django`, but
we've seen a minor error in configuring `DATABASES` can lead to unintended
results, so let's test. Also to demonstrate the default does indeed take
effect as claimed here.

---

Management command:
```python
from django.core.management.base import BaseCommand
from django.db import connections

class Command(BaseCommand):
    def handle(self, *args, **options):
        conn = connections["default"]
        raw_conn = conn.connection
        cursor = raw_conn.cursor()
        cursor.execute("PRAGMA busy_timeout")
        timeout_ms = cursor.fetchone()[0]
        print(f"Timeout: {timeout_ms / 1000} seconds")
```
Options used here are largely inspired by the article
https://kerkour.com/sqlite-for-servers

We think immediate mode in particular might help resolve 'database is locked' errors.

I've largely followed the suggestions from the article, some of which I found
Django applies by default.

I wasn't sure that `PRAGMA temp_store = memory;` was better. This determines
whether files other than the super and rollback journals are written to disk
(eg journal/WAL file of writes to apply; temp index, sort, and join files).
Presumably the main trade off is a marginal speed improvement against possible
additional data loss in some circumstances. E.g., if the process is
interrupted while there are a valid set of pending writes, any incomplete DB
changes can be rolled back and the pending writes may be recoverable and
applied if using file temp_store and consistent with the resulting DB state,
but will be lost if using a memory temp_store. Conversely, maybe the default is
file due to the constrained amount of memory available in the default SQLite
configuration. The differences seem marginal and unrelated to consistency in
any case and I haven't taken time to understand this properly and be
confident in recommending a change.

I did wonder if the very slow search on the OpenCodelists main index page could
be related to using the file system for constructing a large set of joins to
perform the search but that's very speculative. I was also reminded by the
article that "SQLite doesn't keep statistics about its indexes... so COUNT
queries are slow, even when using a WHERE clause on an indexed field: SQLite
has to scan for all the matching records".

I think we should not be using an excessively large timeout as this can
sometimes cause users to wait a long time before getting an error.  Let's try 5
seconds and see if the other settings changes resolve the issue, then gradually
increase the value if there are still many errors.

Previous PRs intended to change the timeout #1143, #1811, #2276 had no effect
as the `DATABASES` property was not correctly structured.
@mikerkelly mikerkelly force-pushed the mikerkelly/sqlite-configuration-options branch from 7a302c6 to c9d83c7 Compare February 13, 2025 08:46
@Jongmassey Jongmassey merged commit beed903 into main Feb 14, 2025
5 checks passed
@Jongmassey Jongmassey deleted the mikerkelly/sqlite-configuration-options branch February 14, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve SQLite configuration: IMMEDIATE transaction mode and PRAGMAs
2 participants