Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Django v5.1.x #2115

Closed
lucyb opened this issue Nov 12, 2024 · 16 comments · Fixed by #2264
Closed

Upgrade to Django v5.1.x #2115

lucyb opened this issue Nov 12, 2024 · 16 comments · Fixed by #2264
Assignees

Comments

@lucyb
Copy link
Contributor

lucyb commented Nov 12, 2024

We could upgrade OpenCodelists to the most recent v5.1.x release.

This would make it easier for the automated dependency updater (whether Dependabot or our own in house action) to keep us up-to-date in future.

@StevenMaude
Copy link
Contributor

StevenMaude commented Nov 27, 2024

This might be easier once #1817 is complete. Now done.

@StevenMaude
Copy link
Contributor

StevenMaude commented Jan 8, 2025

The blocker here is primarily the coding system tests:

django.test.testcases.DatabaseOperationForbidden: Database threaded connections to 'dmd_2022-110_20221001' are not allowed in this test. Add 'dmd_2022-110_20221001' to pytest_django.fixtures._django_db_helper.<locals>.PytestDjangoTestCase.databases to ensure proper test isolation and silence this failure.

There's also a related Django warning that we could ignore for now:

  /opt/venv/lib/python3.12/site-packages/django/db/backends/utils.py:98: RuntimeWarning: Accessing the database during app initialization is discouraged. To fix this warning, avoid executing queries in AppConfig.ready() or when your app modules are imported.
    warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning)

There's a warning against accessing the database in this way in the Django documentation.

Here's the relevant PR where that coding system versioning code was introduced: #1397.

@StevenMaude
Copy link
Contributor

This is the relevant part of that PR's comment:

A function update_coding_system_database_connections, called from the versioning app's AppConfig.ready method, which iterates through all the CodingSystemVersion instances and adds the relevant database config to django's database connections. Note that we can't add these to the DATABASES in settings.py because we can't access the model instances until the apps are fully loaded. (An alternative would be to use the sqlite files in the filesystem - we could do that in settings.py)

@StevenMaude
Copy link
Contributor

More digging and I think that:

  • the failures are going to be specific to Django 5.1, and 5.0.x should work without any modification (needs testing)
  • the main blocker is the "threaded connections" issue, and not the AppConfig.ready() issue; if you specify the database under test explicitly with a decorator, you get a different error, which is because the test database doesn't exist in settings

@StevenMaude
Copy link
Contributor

StevenMaude commented Jan 9, 2025

So Django 5.0.10 does pass the tests. You still get the warning:

RuntimeWarning: Accessing the database during app initialization is discouraged. To fix this warning, avoid executing queries in AppConfig.ready() or when your app modules are imported.
  warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning)

but I don't think that's necessary to fix to get the Django 5.1 upgrade to work. It may be that this gets blocked by Django in future and we're forced to fix it.

I opened #2289 for this, because I believe it's separate. That possibly leaves the configuration for the coding systems tests as being the blocking issue._


⚠ Edit: maybe not, and maybe it's still related? The documentation on the test database also warns against this use of ready().

@StevenMaude
Copy link
Contributor

StevenMaude commented Jan 9, 2025

Here's the relevant Django PR that made the change for Django 5.1: django/django#17639.

It's definitely this change that's the incompatible change. Reverting it and only that change allows the tests to otherwise pass with Django 5.1.

What's odd is that commit refers to "SimpleTestCase". (I'm not sure how pytest-django interacts here, and which TestCase class it's using. I'm also not sure what magic pytest-django is doing to modify the error message, but the failure points to a line of Django code, despite the error message being different.) The behaviour if the database with enable_db_access_for_all_tests should be the same as Django's TestCase though.

@StevenMaude
Copy link
Contributor

StevenMaude commented Jan 9, 2025

This is a related Slack thread on tests failing, for other reasons.

From that, it's worth knowing about conftest.py which actually configures some of pytest-django there.

conftest.py contains database test setup, and we could add the relevant databases being complained about there:

database_aliases = {
"default",
"snomedct_test_20200101",
"dmd_test_20200101",
"icd10_test_20200101",
"ctv3_test_20200101",
"bnf_test_20200101",
}

That instead then results in complaints like:

FAILED coding_systems/bnf/tests/test_import_data.py::test_import_data - django.db.utils.OperationalError: table "bnf_concept" already exists

This occurs even if just running the bnf tests, and deleting but one of those tests.

Relevant stack trace, showing the failure at migration of the new database.

  File "…/coding_systems/bnf/tests/test_import_data.py", line 219, in test_import_data_existing_coding_system_release
    import_data(
  File "…/coding_systems/bnf/import_data.py", line 39, in import_data
    with CodingSystemImporter(
  File "…/coding_systems/base/import_data_utils.py", line 68, in __enter__
    self.setup_new_import_database()
  File "…/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "…/coding_systems/base/import_data_utils.py", line 171, in setup_new_import_database
    call_command("migrate", self.coding_system, database=database_alias)
  File "…/coding_systems/conftest.py", line 54, in mock_migrate_coding_system
    cursor.execute(sql)

⚠ Something I noticed: running only just test-py coding_systems/bnf/tests fails even when just test-py coding_systems/ as a whole does pass, on a correctly working branch (that is, with Django 4.2).


There's also info on multi database support in pytest-django's issue tracker: pytest-dev/pytest-django#924.


I also tested an older version of pytest-django (4.3.0) and get the same errors.

@mikerkelly
Copy link
Contributor

mikerkelly commented Jan 13, 2025

This is a complex topic with multiple interlinked issues. Thank you, Steve, for thoroughly investigating, reducing the remaining issues in your draft PR, and tracing the history and references. Below are my thoughts on several areas. Apologies for the length of this message. Happy to discuss any of this if that helps.

Overall, I think this has surfaced some issues with testing, implementation, and possibly architecture. We should consider how we respond to these. However, I don't think either main issue should block us from upgrading to Django 5.1 as soon as possible. There appear to be workarounds for the DatabaseOperationForbidden errors (see below), even if the eventual resolution involves a redesign. As you say, the AppConfig.ready() warning indicates a problem with when we establish the dynamic coding_systems database connections, but it's not a blocker. We should upgrade with acceptable adjustments to some tests and create some follow-on Issues.

Alternatively, we could initially move to Django 5.0, which doesn’t require immediate workarounds. However, I currently think we can go straight to Django 5.1.

We could enquire in pytest-django's project about these issues.

Database architecture (separate DB per coding system version)

Looking into this issue has made me further question the architecture of having a separate database per coding system version. This approach seems overly complicated, introducing many sharp edges and challenges in understanding and testing, as these issues demonstrate.

An alternative could be to structure each coding system as its own app, with a single database per coding system (rather than per version). These apps would add their databases to connections during app.Ready() execution, and the per-version scoping could occur within these databases, rather than in the database routing. This would eliminate the need for database dynamism after app startup. While this might introduce complexity in implementation, it could simplify the overall architecture.

I haven’t fully thought this through, and these concerns need to be weighed against the rationale for the current design laid out in:

DatabaseOperationForbidden errors

The DatabaseOperationForbidden errors occur because there's a mismatch between what these test cases that import databases dynamically are trying to do and how both the Django and pytest-django test case frameworks work. In a multi-database context, the test frameworks want to declare which databases are available to a test case at test-case-definition-time, which then triggers those databases being created, the test case wrapped in a transaction referencing those databases, and the databases flushed in clean up. Since 5.1 it's enforced that test cases only open connections to such databases, to avoid accidental access. But the coding system release import tests do want to access a database that they do not want to exist at the start of the test. That's in conflict. So maybe we shouldn't be using test cases this way at all?

We could try to write pytest-style test cases with custom database fixtures (or extensions of existing ones) to handle this case, but that seems challenging as it involves a lot of both pytest and Django internals, and there is not even a default database and Django machinery available if we start with bare pytest tests. Maybe we could switch from testing via these integration/unit tests to more functional/end-to-end tests that just shell out to call management commands, but I'm not sure if enough of the Django machinery can be made available without the test database already being set up. Maybe we can create the default database ourselves with SQL commands then not interact with the Django Python code directly (use management commands rather than interacting with the models).

The other direction I can think of for redesign is to make the code under test more granular and separable and do more isolated-unit-style test cases with more mocking. That seems more promising to me. coding_systems/bnf/import_data.py does a lot of work, doing file system access, file parsing, and coding system import (using CodingSystemImporter which has its own tests. Maybe this can be broken down further and mocks used. Maybe only one (subset of the) test case would care about database dynamism, the file system, etc.. There is already mock_migrate_coding_system -- maybe not all the tests even need this much?

What's odd is that commit refers to "SimpleTestCase". (I'm not sure how pytest-django interacts here, and which TestCase class it's using. I'm also not sure what magic pytest-django is doing to modify the error message, but the failure points to a line of Django code, despite the error message being different.) The behaviour if the database with enable_db_access_for_all_tests should be the same as Django's TestCase though.

pytest(-django) tests can inherit from SimpleTestCase (no DB access) or TestCase / TransactionTestCase (DB access) directly. However, most tests in this repository, including the failing ones, are "pytest-style" function-based tests rather than class-based tests. These tests use fixtures to share behavior, whereas Django-style tests rely on classes. More relevant here, pytest-django mostly relies on Django's underlying TransactionTestCase and TestCase classes internally in its fixtures for handling DB setup. Each pytest-django test gets run in a dynamically-generated TestCase/TransactionTestCase subclass.

The main setting for multi-db support is (Transaction)TestCase.databases. This tells Django which databases to consider for the test case. By default it's only default. It's possible to specify __all__ to include all configured databases from settings.py. Available databases are configured in pytest_django.fixtures._django_db_helper as part of a wrapper class, PytestDjangoTestCase, which uses settings.DATABASES default and any multi-database settings defined in the test.

The universe fixture extends the django_db_setup fixture. It uses the django_db_modify_db_settings hook to add database_aliases to settings.DATABASES. This triggers django.test.utils.setup_databases, which creates in-memory SQLite databases for each defined database.

Workarounds

The implementation and test design currently don't align well with the test framework. While we can raise an issue to address this mismatch long-term, in the short term, how can we fix or bypass the DatabaseOperationForbidden errors while still maintaining the value of these tests, ahead of a more significant redesign?

First of all, we could just disable the tests for now. Although we rely on the ability to import data, it's an occasional task, and we might still achieve some level of assurance that the correct behavior occurs without these tests. However, this approach isn't ideal as it reduces the overall test coverage.

Or we could try to patch the code raising the DatabaseOperationForbidden exception, with appropriate documentation around this, as we know that it's checking for something that isn't true in our test design.

Or we could try to fit our implementation and test better into what the framework expects. There are several other ways to patch or modify the tests to pass the new checks. I'm unsure this is a wise strategy beyond the short term. But maybe it's more appealing than simply turning the behaviour off. But I'm not sure it amounts to much different. I can't yet see a way to truly fix the issue that doesn't involve a significant redesign of the tests and/or implementation.

I gave some thought to such workarounds but I'm not sure they're better than just stopping the DatabaseOperationForbidden exception being raised directly, which is more explicit and clear that we don't think that check should apply in these tests. I'll put those in a separate message after this. At least it can help illustrate why I don't see a cleaner approach without significant resdesign.

AppConfig.ready() warning

There's also a related Django warning that we could ignore for now: RuntimeWarning: Accessing the database during app initialization is discouraged. To fix this warning, avoid executing queries in AppConfig.ready() or when your app modules are imported.

I think this warnings and the docs are pretty clear, we shouldn't be doing this. But it's not a blocker. Executing queries during app initialization can lead to issues, such as test processes unintentionally interacting with non-test databases. There are already comments in conftest.py noting this risk.

What could we do differently?

  • We could defer updating the connection per-database until it is first accessed during the application's lifetime.
  • We could explore options to delay this behavior until later in the application lifecycle, beyond startup.
  • We could redesign the database architecture as discussed above.
  • In test cases, initial databases are created in memory rather than on the file system. This makes the current behavior irrelevant in test contexts. Disabling it during testing might avoid unnecessary complexity.

Related-but-distinct issue: when running CodingSystemImporter.setup_new_import_database,
update_coding_system_database_connections is called which updates each
connections.databases[coding_system_release.database_alias] = database_dict, assuming that the database should exist on the file system. In test cases, these initial databases exist in memory, not on disk, so the method overwrites correct in-memory names with incorrect file system paths. I'm not sure that this breaks the connection but it was certainly confusing when using a debugger to interrogate connection state during test execution. Since only one database is being imported at a time, in this context, we could parameterise update_coding_system_database_connections to limit its effect to the specific database actually being created in the file system, rather than applying it globally.

Tests not running in isolation

Something I noticed: running only just test-py coding_systems/bnf/tests fails even when just test-py coding_systems/ as a whole does pass, on a correctly working branch (that is, with Django 4.2).

This implies that the test cases are not running in isolation, as they should. They're leaking state in the database, which they shouldn't. Other test cases rely on this, which they shouldn't. Specifically, these ones rely on a previous BNF coding system release being present. This abbreviated stack trace shows the issue:

File coding_systems/base/import_data_utils.py", line 108, in __exit__
    update_codelist_version_compatibility(
coding_systems/base/import_data_utils.py", line 218, in update_codelist_version_compatibility
    versions_to_check, previous_release = get_versions_compatible_with_previous(
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File coding_systems/base/import_data_utils.py", line 241, in get_versions_compatible_with_previous
    .latest("valid_from")
...
  File django/db/models/query.py", line 637, in get
    raise self.model.DoesNotExist(
coding_systems.versioning.models.CodingSystemRelease.DoesNotExist: CodingSystemRelease matching query does not exist.

The test_import_data tests fail when run in isolation but pass when executed alongside tests that call any fixture from opencodelists/tests/fixtures.py. Adding any such fixture temporarily resolves the issue in Django 4.2.

That indicates a problem with the design of those fixtures where universe and build_fixtures are called exactly once per test session to load some test data from the file system and put it in the default database, along with some objects created directly (such as the user_without_organisation). The fixtures built by build_fixture all extend universe then return the required object from the database, leaving these objects in the database after any test using these fixtures is executed.

As a result, test cases become dependent on the order in which tests are run, as the state of the database is shared across tests. This creates unpredictable behavior and makes tests harder to maintain. The database(s) should be clean and empty between tests, with each test setting up only the necessary data.

Another issue is that the current fixture setup makes it difficult to write certain types of tests, such as those that require a database to be empty of specific models or that involve mutating all instances of a certain type. For example, a test case that attempts to import data without a previous release, as mentioned earlier, or this simple case:

def test_database_initially_empty():
    from opencodelists.models import User
    assert User.objects.count() == 0

This passes when it runs in isolation, but fails when run with any other test case that uses the opencodelists/tests/fixtures.py .

A better medium-term approach could be to modify the fixture module to load the universe of test data into memory rather than directly into the database. The fixtures could then save only the specific objects they require to the database, extending other fixtures for related objects. This approach reduces the overhead of file system access and model instance construction to once per session, as in the current approach, but without polluting the database between test cases. Additionally, this in-memory representation could be useful for generating local dummy data, as discussed in #2117.

@mikerkelly
Copy link
Contributor

mikerkelly commented Jan 13, 2025

In my previous message I suggested that the DatabaseOperationForbidden errors occur because of a mismatch between our test design and the framework. I suggest the best workaround is to prevent the exception by patching the test framework being raised with suitable centralized documentation of why we are doing this. But maybe we can't do that or there are other ways? This comment explores a few. Maybe it will spark some ideas from someone else.

Steve tried:

conftest.py contains database test setup, and we could add the relevant databases being complained about there:

FAILED coding_systems/bnf/tests/test_import_data.py::test_import_data - django.db.utils.OperationalError: table "bnf_concept" already exists

I think that won't work for a full dynamic DB import test because if we add those databases, they would be initialized by the test framework. This would cause the test to fail, as it expects the database file and its schema/content not to be present until they are explicitly created during the test. We could try and remove the database the framework creates but that seems messy.

Why is the check failing? #17639 and #17879 added the checks to django.test.testcases.SimpleTestCase:

self.connection is None
and self.alias not in cls.databases
and self.alias != NO_DB_ALIAS
# Dynamically created connections are always allowed.
and self.alias in connections

This is in a patched extension to BaseDatabaseWrapper.ensure_connection, called when Django makes a query, and can trigger when it first accesses a database in process(?) lifetime, when self.connection isn't initialized. This is trying to avoid tests accidentally accessing databases that shouldn't be initialized yet that haven't been declared for the test environment (which might even be a "production" DB in whatever environment they're running in!).

There are three more conditions we could consider fixing or patching to pass these checks.

self.alias in connections sounds initially most promising for our case with the comment about dynamic connections but if we look at the test added in #17879 due to #35226 we can see this is thinking of a cursor-only copy of an existing connection, for testing connection pooling. We could modify mock_migrate_coding_system to use with connections[database].copy().cursor() as cursor and that then wouldn't trigger the error as it does now and the test case gets a little bit further, but all the other DB accesses to the dynamic DB rely on the connection being in Django's connections and use QuerySet.objects.using(db_alias) which, does trigger the error still. Maybe there's some way we can patch Django or all of the relevant implementation code to use a temporary cursor for every such query during these tests but that seems very heavyweight.

NO_DB_ALIAS is a Django constant representing operations that do not require a database connection. They're not blocked. It's trivial to mock that in this in this module, which seems to make the tests pass, but that's definitely a hack. Also, maybe it would cause problems if the real value were needed somehow.

Finally, cls.databases (cls being the dynamic PytestDjangoTestCase class) could be added to at test-runtime. That sounds a little more in line with the intended semantics, but I don't think it is, really -- cls.databases is not expected to mutate during test case execution. And for pytest-style test cases the test case isn't easily accessible to us as it's dynamically constructed. We could convert these test to Django-style class-based test cases, then we could twiddle type(self).databases during test setUp. That feels a bit smelly as we're modifying a class property, but it's a dynamic class so it shouldn't affect other tests. I've attached a diff showing roughly what that could look like, but I'm currently thinking it's not very appealing. class_based_testcases.diff.txt

@StevenMaude
Copy link
Contributor

StevenMaude commented Jan 15, 2025

Mike's explanation above is an excellent overview at the multiple issues that the change in behaviour of Django 5.1 exposes in our code and tests.

For quick reference, I've summarised the main problems, and the possible solutions Mike proposed, so that we can see at a glance when reviewing them.

We may want to open separate issues for the problems that do not block the Django 5.1 upgrade, but I've summarised them here too initially.


DatabaseOperationForbidden errors with the coding system databases

This is the most relevant problem here as it blocks the upgrade.

We'll use this issue to discuss this specific problem.

  • Rearchitect the coding system databases — not ideal, and almost certainly out of scope for this upgrade, but it may be needed in future.
  • Rework the tests into pytest test cases with custom fixtures — may be a lot of work.
  • Wrangling cls.databases in the tests — possible, but it's a way in which Django maybe doesn't expect.
  • Rework both code and tests — aim to remove where possible the dependency of testing on database access
  • Catch and ignore the DatabaseOperationForbidden exception in the tests.
  • Disable the broken tests entirely — by catching or patching out DatabaseOperationForbidden, or marking them with pytest as xfail or skip.

Coding system database connections are initialised in AppConfig.ready(), contrary to Django's best practice

Created #2289 for this.

This does not block upgrading, but may be behaviour that does block a future Django upgrade.

  • Rearchitect the coding system databases so that this initialisation is not needed.
  • Rework this code to defer updating connections at a later time.
  • Disable the behaviour in testing.

CodingSystemImporter.setup_new_import_database.update_coding_system_database_connections has a bug where it overwrites test in-memory paths with file system paths.

Created #2300 for this.

This does not block upgrading.

  • Fix the bug, which should be possible without rearchitecting the coding system databases.
  • But, doing that rearchitecting may remove this bug.

Coding system tests are not isolated

Created #2301 for this.

This does not block upgrading.

  • Load the test data into memory, instead of in the database.

@StevenMaude
Copy link
Contributor

StevenMaude commented Jan 15, 2025

The other consideration is that, aside from "we eventually will have to upgrade to a version of Django with this breaking behaviour" (in April 2026 when the 4.2 LTS is end-of-life), is that we want to upgrade to unblock #2268, which could improve the performance of the site for users.

We can then consider the trade-off: is making the tests worse in the short-term — working around them, or making them more hacky, without overhauling the coding system code and tests — worth that benefit?

My hunch is probably, yes.

In that context, I didn't think Mike's suggested class-based refactoring of the tests looked too bad for a short-term fix.

@StevenMaude
Copy link
Contributor

StevenMaude commented Jan 15, 2025

With regard to the blocking issue of the test failures, we talked through three options for short-term fixes this afternoon:

  1. patch out the underlying Django connection function in our test code, to ignore this failure and get back behaviour more like the previous non-breaking behaviour
    • ➕ single place of changing the code
    • ➖ magically changing the underlying Django behaviour is not great
    • ➖ we would have to maintain this with each Django upgrade
  2. wrangle cls.databases in tests
    • ➕ easy to understand the code; we're updating the databases accessed during test runtime…
    • ➖ …but this value is not expected by Django to be changed during test runtime
  3. handle the DatabaseOperationForbidden exception in tests
    • We were leaning towards this, but it turns out that this doesn't actually work as it might have done at a glance, and would need rethinking.

@mikerkelly
Copy link
Contributor

mikerkelly commented Jan 16, 2025

I couldn't find a way to make 1 from Steve's latest summary work after our discussion yesterday.

I think because the test class is constructed dynamically based on the required set of fixtures, the place you'd need to patch it doesn't have an accessible name at test execution time, that I can see. The test function code is read and its class defined before patches can take effect. Patching the class definition (either in django.test.testcases or the PytestDjangoTestCase from fixtures.py) won't affect the test case instance at that point. In the original post of pytest-dev/pytest-django#924 one of the devs specifically rules out the approach in pytest-dev/pytest-django#431 of allowing pytest-style test cases to access and modify their own class via fixture, so I'm not sure it's currently possible.

And 3 doesn't work because there isn't a way to return the control flow after the exception is raised.

With a Django-style class-based test case as in the cls.databases approach we can access the instance during test execution (eg as self in its methods), but I didn't have much luck understanding if we could stop from there the changes that SimpleTestCase make to ensure_connection. Also, if we make the effort of switching the import tests to class-based, modifying the databases parameter seems more intuitive and less risky. So I'm leaning towards this approach.

So 2, the cls.databases approach, is the only one that we have a working example of at this time, and it seems fairly clean and relatively easy to understand. It's also nice because it gives us an obvious place to document why this is needed (although it might also merit a pointer in some dev docs about testing approach and/or the database architecture). So currently this is the approach I favour for this issue.

I noted my concern that modifying cls.databases might not comply with the intended semantics. Here's the class hierarchy from within a DynamicDatabaseTestCase:

inspect.getmro(type(self))
(<class 'coding_systems.bnf.tests.test_import_data.TestImportData'>,
<class 'coding_systems.bnf.tests.test_import_data.DynamicDatabaseTestCase'>,
<class 'django.test.testcases.TestCase'>,
<class 'django.test.testcases.TransactionTestCase'>,
<class 'django.test.testcases.SimpleTestCase'>,
<class 'unittest.case.TestCase'>,
<class 'object'>

So it's inherits from unittest's TestCase and all of Django's SimpleTestCase, then TransactionTestCase, then TestCase. What does databases mean for the Django classes according to the docs?

SimpleTestCase.databases
SimpleTestCase disallows database queries by default. This helps to avoid executing write queries which will affect other tests since each SimpleTestCase test isn’t run in a transaction. If you aren’t concerned about this problem, you can disable this behavior by setting the databases class attribute to 'all' on your test class.

TestCase.databases
By default, only the default database will be wrapped in a transaction during a TestCase’s execution and attempts to query other databases will result in assertion errors to prevent state leaking between tests. Use the databases class attribute on the test class to request transaction wrapping against non-default databases.

TransactionTestCase.databases
Django sets up a test database corresponding to every database that is defined in the DATABASES definition in your settings and referred to by at least one test through databases.
...
As an optimization, Django only flushes the default database at the start of each test run. If your setup contains multiple databases, and you have a test that requires every database to be clean, you can use the databases attribute on the test suite to request extra databases to be flushed.

So I think the things it does are: specify the databases to be created at test startup (in a non-SimpleTestCase); wrap in a transaction against those databases; enforce that no other database is accessed; and flush those databases after the run. I think they're possibly not thinking of the dynamic-databases case, but I'm not sure. I'm coming around to the view that modifying databases during test execution time is a reasonable interpretation of how to use this framework to test a dynamically-created database. It does need to be removed again in cleanUp so the wrapping transaction doesn't try to do something with it, which it shouldn't need to, not will it be able to (it should exist in a temporary file location not touched by any other test).

So I think my prototype approach in class_based_testcases.diff.txt may be pretty close to something reasonable to use. It does need a proper location for the temporary files, maybe using tempfile rather than pytest stuff. The prototype isn't isolating the files which might leak state between tests and the two two test frameworks (although maybe it would be fine if the database file were cleaned up, which we can do). I'd be happy if the resolution of the DatabaseOperationForbidden issue is that someone (I suppose probably Steve) wanted to try that approach more broadly.

@bloodearnest
Copy link
Member

There's a lot here, and I'm not sure I've followed it all, and I maybe missing something.

But I wanted to ask a quick question, just to make sure that we're not missing a simpler solution?

Why are we considering upgrading to django 5.2? AFAIK, tts not been released yet? Unless I've missed something?

I ask, because it looks like the db fixtures changes/breaking is in 5.2, not 5.1, is that right?

I would expect that we would need to wait for django-pytest to support 5.2 before we upgraded to it?

Apologies if this has already been mentioned! There's a lot to read here.

@bloodearnest
Copy link
Member

The discussion in the tech team meeting clarified that this is still an issue in 5.1, so ignore me.

@mikerkelly
Copy link
Contributor

Apologies,@bloodearnest, some of my messages mentioned 5.1 vs 5.2 but that was just a thinko for what should have read 5.0 vs 5.1. Edited now.

StevenMaude added a commit that referenced this issue Jan 28, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.
StevenMaude added a commit that referenced this issue Jan 28, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.
StevenMaude added a commit that referenced this issue Jan 28, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.
StevenMaude added a commit that referenced this issue Jan 28, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Jan 30, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Jan 30, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Jan 30, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

It was decided that refactoring the existing tests was the solution for
now, rather than rethinking them or the code under test entirely. This
facilitates the upgrade to Django 5.1 without blocking us further. We
may have to revisit this code and these tests in future, with a view to
a more comprehensive overhaul.

This single class facilitates getting most of the tests working by just
moving them into a test class that uses this DynamicDatabaseTestCase
class. Those tests have a generalisable pattern for accessing the
databases. The BNF and base coding system tests are slightly more
complicated, and the issues will be described in those respective
subsequent commits.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Jan 30, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

It was decided that refactoring the existing tests was the solution for
now, rather than rethinking them or the code under test entirely. This
facilitates the upgrade to Django 5.1 without blocking us further. We
may have to revisit this code and these tests in future, with a view to
a more comprehensive overhaul.

This single class facilitates getting most of the tests working by just
moving them into a test class that uses this DynamicDatabaseTestCase
class. Those tests have a generalisable pattern for accessing the
databases. The BNF and base coding system tests are slightly more
complicated, and the issues will be described in those respective
subsequent commits.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Jan 30, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

It was decided that refactoring the existing tests was the solution for
now, rather than rethinking them or the code under test entirely. This
facilitates the upgrade to Django 5.1 without blocking us further. We
may have to revisit this code and these tests in future, with a view to
a more comprehensive overhaul.

This single class facilitates getting most of the tests working by just
moving them into a test class that uses this DynamicDatabaseTestCase
class. Those tests have a generalisable pattern for accessing the
databases. The BNF and base coding system tests are slightly more
complicated, and the issues will be described in those respective
subsequent commits.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Jan 30, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

It was decided that refactoring the existing tests was the solution for
now, rather than rethinking them or the code under test entirely. This
facilitates the upgrade to Django 5.1 without blocking us further. We
may have to revisit this code and these tests in future, with a view to
a more comprehensive overhaul.

This single class facilitates getting most of the tests working by just
moving them into a test class that uses this DynamicDatabaseTestCase
class. Those tests have a generalisable pattern for accessing the
databases. The BNF and base coding system tests are slightly more
complicated, and the issues will be described in those respective
subsequent commits.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Feb 5, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

It was decided that refactoring the existing tests was the solution for
now, rather than rethinking them or the code under test entirely. This
facilitates the upgrade to Django 5.1 without blocking us further. We
may have to revisit this code and these tests in future, with a view to
a more comprehensive overhaul.

This single class facilitates getting most of the tests working by just
moving them into a test class that uses this DynamicDatabaseTestCase
class. Those tests have a generalisable pattern for accessing the
databases. The BNF and base coding system tests are slightly more
complicated, and the issues will be described in those respective
subsequent commits.

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Feb 6, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

Altering this attribute is mentioned in Django's documentation:

https://docs.djangoproject.com/en/5.1/topics/testing/tools/#multi-database-support

It was decided that refactoring the existing tests was the solution for
now, rather than rethinking them or the code under test entirely. This
facilitates the upgrade to Django 5.1 without blocking us further. We
may have to revisit this code and these tests in future, with a view to
a more comprehensive overhaul.

The `DynamicDatabaseTestCase class facilitates getting most of the tests
working by just moving them into a test class that uses this
DynamicDatabaseTestCase class. Those tests have a generalisable pattern
for accessing the databases.

The tests that need refactoring fall into one of three general patterns:

* most tests that do not need a temporary coding system database
  directory, handled by the `DynamicDatabaseTestCase` class
* some tests that need a temporary coding system database directory,
  handled by the `DynamicDatabaseTestCaseWithTmpPath` class, which
  extends `DynamicDatabaseTestCase`
* a few edge cases that are discussed in subsequent commits; these will
  be worked around, rather than refactored in this series of commits
  (though they could be reworked in future):
  * tests in `coding_systems/base/tests/test_import_data_utils.py` that
    access the database in fixtures, before the test class' `setUp()`
    runs; these will be handled in that module specifically
  * tests in `coding_systems/bnf/tests/test_import_data.py` that uses a
    pytest fixture to produce import data, instead of fixtures stored in
    the repository, unlike the other tests

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
StevenMaude added a commit that referenced this issue Feb 6, 2025
See #2115 for full details.

The problem this refactoring aims to solve is, in short, as quoted from
@mikerkelly in #2115:

> The [DatabaseOperationForbidden] errors occur because there's a mismatch
> between what [these test cases](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> that import databases dynamically are
> trying to do and how both the Django and `pytest-django` test case
> frameworks work. In a multi-database context, the test frameworks want
> to declare which databases are available to a test case at
> test-case-definition-time, which then triggers those databases being
> created, the test case wrapped in a transaction referencing those
> databases, and the databases flushed in clean up. Since 5.1 it's
> enforced that test cases only open connections to such databases, to
> avoid accidental access. But the [coding system release import tests](https://github.com/opensafely-core/opencodelists/blob/main/coding_systems/bnf/tests/test_import_data.py)
> do want to access a database that they do not want to exist at the start
> of the test. That's in conflict. So maybe we shouldn't be using test cases
> this way at all?

Mike proposed several ideas to fix this that we looked at, and we found that:
amending the test case attribute that defines the allowed databases
seems reasonable, and, further, this was the only solution that we had working.

Altering this attribute is mentioned in Django's documentation:

https://docs.djangoproject.com/en/5.1/topics/testing/tools/#multi-database-support

It was decided that refactoring the existing tests was the solution for
now, rather than rethinking them or the code under test entirely. This
facilitates the upgrade to Django 5.1 without blocking us further. We
may have to revisit this code and these tests in future, with a view to
a more comprehensive overhaul.

The `DynamicDatabaseTestCase class facilitates getting most of the tests
working by just moving them into a test class that uses this
DynamicDatabaseTestCase class. Those tests have a generalisable pattern
for accessing the databases.

The tests that need refactoring fall into one of three general patterns:

* most tests that do not need a temporary coding system database
  directory, handled by the `DynamicDatabaseTestCase` class
* some tests that need a temporary coding system database directory,
  handled by the `DynamicDatabaseTestCaseWithTmpPath` class, which
  extends `DynamicDatabaseTestCase`
* a few edge cases that are discussed in subsequent commits; these will
  be worked around, rather than refactored in this series of commits
  (though they could be reworked in future):
  * tests in `coding_systems/base/tests/test_import_data_utils.py` that
    access the database in fixtures, before the test class' `setUp()`
    runs; these will be handled in that module specifically
  * tests in `coding_systems/bnf/tests/test_import_data.py` that uses a
    pytest fixture to produce import data, instead of fixtures stored in
    the repository, unlike the other tests

Co-authored-by: Mike Kelly <mike.kelly@thedatalab.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants