You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
subflow_query in the cancellation_cleanup service queries all columns in the flow_run table. This has a HUGE effect on the query runtime. From what I can see, only ~4 columns are actually required though to cancel the subflows. I believe sqlalchemy supports partial objects so the code change would be minimal. On our Prefect database*, a select * query takes 33s vs 0.7s for select id, state_type, parent_task_run_id, deployment_id. I think this might also help issue #16299.
Current query (as I understand it):
SELECT*FROM flow_run
WHERE state_type IN ('PENDING', 'SCHEDULED', 'RUNNING', 'PAUSED', 'CANCELLING')
AND id >'00000000-0000-0000-0000-000000000000'AND parent_task_run_id IS NOT NULLORDER BY id
LIMIT200;
Proposed query:
SELECT id, state_type, parent_task_run_id, deployment_id -- Only essential columnsFROM flow_run
WHERE state_type IN ('PENDING', 'SCHEDULED', 'RUNNING', 'PAUSED', 'CANCELLING')
AND id >'00000000-0000-0000-0000-000000000000'AND parent_task_run_id IS NOT NULLORDER BY id
LIMIT200;
Background: we have been struggling with our Postgres database being under heavy load and as a result both RecentDeploymentsScheduler and CancellationCleanup have been taking longer than their loop interval (~5-8s and ~50-70s vs 5s and 20s default loop intervals respectively). Rather than just beefing up our database, it seemed there was some potential for efficiency improvements. When looking at top SQL statements the above query is the heaviest by an order of magnitude. Disclaimer: I don't have much experience with databases.
*Our Prefect database is probably moderately sized: ~400k flow runs
Version info
Version: 3.0.2
API version: 0.8.4
Python version: 3.11.11
Git commit: c846de02
Built: Fri, Sep 13, 2024 10:48 AM
OS/Arch: linux/x86_64
Profile: default
Server type: server
Pydantic version: 2.10.5
Additional context
No response
The text was updated successfully, but these errors were encountered:
Small update: I think the above would fix our problem but the underlying issue was actually because we had a few (~40) flow runs which had a big matrix (~50MB) as input parameter. This then caused a query to the flow_run table to be very slow.
Bug summary
subflow_query
in the cancellation_cleanup service queries all columns in theflow_run
table. This has a HUGE effect on the query runtime. From what I can see, only ~4 columns are actually required though to cancel the subflows. I believe sqlalchemy supports partial objects so the code change would be minimal. On our Prefect database*, aselect *
query takes 33s vs 0.7s forselect id, state_type, parent_task_run_id, deployment_id
. I think this might also help issue #16299.Current query (as I understand it):
Proposed query:
Background: we have been struggling with our Postgres database being under heavy load and as a result both
RecentDeploymentsScheduler
andCancellationCleanup
have been taking longer than their loop interval (~5-8s and ~50-70s vs 5s and 20s default loop intervals respectively). Rather than just beefing up our database, it seemed there was some potential for efficiency improvements. When looking at top SQL statements the above query is the heaviest by an order of magnitude. Disclaimer: I don't have much experience with databases.*Our Prefect database is probably moderately sized: ~400k flow runs
Version info
Additional context
No response
The text was updated successfully, but these errors were encountered: