Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] fastapi: Forwardport 16.0 pullrequest 486 - Avoid zombie threads #499

Open
wants to merge 6 commits into
base: 18.0
Choose a base branch
from

Conversation

lembregtse
Copy link

@lembregtse lembregtse commented Feb 25, 2025

This is a forward-port of #486.

I am aware that that PR is not yet finalized / approved, but we required the fix for version 18.0.

We have modified the caching system to be aligned with Odoo's new way of doing caches and refreshes.

We will keep this PR update with the downstream PR.

@OCA-git-bot
Copy link
Contributor

Hi @lmignon,
some modules you are maintaining are being modified, check this out!

@lembregtse lembregtse force-pushed the 18.0-fastapi-fwp-16.0-event-loop-lifecycle branch from 440b332 to fea0cef Compare February 25, 2025 06:30
Copy link
Contributor

@lmignon lmignon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lembregtse for the forward port. Nevertheless, can you preserve the authorship of the initial changes in 16.0. To preserve the author of the initial change, simply make a cherry pick of the PR commit on branch 16.0 and then make your changes in a new commit. This will also make it easy to see what adaptations have been made between the two versions.
Out of curiosity, what problems did you encounter with the previous implementation and in what context?

Each time a fastapi app is created, a new event loop thread is created by the ASGIMiddleware. Unfortunately, every time the cache is cleared, a new app is created with a new even loop thread. This leads to an increase in the number of threads created to manage the asyncio event loop, even though many of them are no longer in use. To avoid this problem, the thread in charge of the event loop is now created only once per thread / process and the result is stored in the thread's local storage. If a new instance of an app needs to be created following a cache reset, this ensures that the same event loop is reused.

refs OCA#484
This commit adds event loop lifecycle management to the FastAPI dispatcher.

Before this commit, an event loop and the thread to run it were created
each time a FastAPI app was created. The drawback of this approach is that
when the app was destroyed (for example, when the cache of app was cleared),
the event loop and the thread were not properly stopped, which could lead
to memory leaks and zombie threads. This commit fixes this issue by creating
a pool of event loops and threads that are shared among all FastAPI apps.
On each call to a FastAPI app, a event loop is requested from the pool and
is returned to the pool when the app is destroyed. At request time of
an event loop, the pool try to reuse an existing event loop and if no event
loop is available, a new event loop is created.

The cache of the FastAPI app is also refactored to use it's own mechanism.
It's now based on a dictionary of queues by root path by database,
where each queue is a pool of FastAPI app. This allows a better management
of the invalidation of the cache. It's now possible to invalidate
the cache of FastAPI app by root path without affecting the cache of others
root paths.
On server shutdown, ensure that created the event loops are closed properly.
defaultdict in python is not thread safe. Since this data structure
is used to store the cache of FastAPI apps, we must ensure that the
access to this cache is thread safe. This is done by using a lock
to protect the access to the cache.
This commit improves the lifecycle of the fastapi app cache.
It first ensures that the cache is effectively invalidated when changes
are made to the app configuration even if theses changes occur into an
other server instance.
It also remove the use of a locking mechanism put in place to ensure a thread
safe access to a value into the cache to avoid potential concurrency issue when
a default value is set to the cache at access time. This lock could lead
to unnecessary contention and reduce the performance benefits of queue.Queue's
fine-grained internal synchronization for a questionable gain. The only
expected gain was to avoid the useless creation of a queue.Queue instance
that would never be used since at the time of puting the value into the cache
we are sure that a value is already present into the dictionary.
@lembregtse lembregtse force-pushed the 18.0-fastapi-fwp-16.0-event-loop-lifecycle branch from fea0cef to f031284 Compare February 25, 2025 07:48
@lembregtse
Copy link
Author

@lmignon Ofcourse, my bad, done. We are running Odoo containerized with gunicorn and ran into an issue with a customer where we use FastAPI for ETL data imports, where on certain objects/calls where the cache was refreshed zombie threads were being created.

Considering we have a quite strict thread spawning limitation in those containers, after about 60 to 70 calls, the threads got exhausted and the FastAPI app would no longer accept new connections until the worker was recycled. While debuggin the issues we came across the PR for version 16.0.

…linting

[FIX] fastapi: Apply linting recommendations in 18
@lembregtse lembregtse force-pushed the 18.0-fastapi-fwp-16.0-event-loop-lifecycle branch from f031284 to 56a6d4a Compare February 25, 2025 07:52
@lmignon
Copy link
Contributor

lmignon commented Feb 25, 2025

@lmignon Ofcourse, my bad, done. We are running Odoo containerized with gunicorn and ran into an issue with a customer where we use FastAPI for ETL data imports, where on certain objects/calls where the cache was refreshed zombie threads were being created.

Thank you for the explanation and your changes. Indeed this PR should solve your issue. We weren't affected by this one at acsone, because our instances always run in multi-worker mode and not in multi-thread mode. Solving this problem was an opportunity to improve the management of the eventpool and cache, which even if they weren't problematic in our case, were still not optimal. Still out of curiosity, what are the motivations for preferring gunicorn to serve Odoo instead of Odoo's multiworker runner? Don't you lose the mechanisms for managing the memory per worker and the maximum process execution time?
Kind regards.

Copy link
Contributor

@lmignon lmignon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (Code review only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants