Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server stop will wait until all loading models to complete before unload all models #324

Merged
merged 4 commits into from
Jan 31, 2024

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Jan 26, 2024

Related PR: triton-inference-server/server#6837

When the server is stopping and trying to unload all models, the unload all models will wait until all models are no longer in transition (i.e. loading/unloading) before starting the unload.

@kthui kthui force-pushed the jacky-server-stop branch from 90f3f53 to f2bead5 Compare January 27, 2024 03:00
@kthui kthui requested review from rmccorm4 and GuanLuo January 27, 2024 03:02
@kthui kthui marked this pull request as ready for review January 27, 2024 03:02
// Get a set of all models, and make sure non of them are loading/unloading.
std::set<ModelIdentifier> all_models;
bool all_models_locked = false;
while (!all_models_locked) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed possible simplification of using existing functions like LoadUnloadModels etc to make sure global state is updated after this function completes. Didn't want to leave global states in a weird place after unloading everything in the event this function is used for other purposes other than only shutdown, or if extra explicit load/unload calls slip in after this function - having correct global state would reduce possible bugs/errors.

RETURN_IF_ERROR(LoadUnloadModels(
models, ActionType::UNLOAD, true /* unload_dependents */, &polled,
&no_parallel_conflict));
} while (!no_parallel_conflict);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where this while loop can never exit? Also does it now cause issues possibly disregarding the "30 second timeout"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where this while loop can never exit?

If a model is stuck in loading/unloading or there is a bug somewhere else in model_repository_manager, then this loop will never exit, but both are not expected.

Also does it now cause issues possibly disregarding the "30 second timeout"?

If a model is loading, then the loading thread will stop the timeout from counting forward until the unload can begin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does it now cause issues possibly disregarding the "30 second timeout"?

If a model is loading, then the loading thread will stop the timeout from counting forward until the unload can begin.

Respect server shutdown timeout when waiting for load

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now it will be something like this?

30 second timeout ...
Error: found conflict model: abc (still loading)
29 second timeout ...
Error: found conflict model: abc (still loading)
28 second timeout ...
Error: found conflict model: abc (still loading)
27 second timeout ...
Error: found conflict model: abc (still loading)
### abc finished loading
Unloading abc...
26 second timeout ...
Live models found: 1 ... (abc unloading)
25 second timeout ...
Live models found: 1 ... (abc unloading)
...
Done

Can you share an example output of waiting for model to finish loading before unloading, if you have one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Signal (15) received.
I0131 00:20:13.656730 128 server.cc:307] Waiting for in-flight requests to complete.
I0131 00:20:13.656747 128 model_lifecycle.cc:223] StopAllModels()
I0131 00:20:13.656757 128 model_lifecycle.cc:241] InflightStatus()
I0131 00:20:13.656764 128 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0131 00:20:13.656794 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32'
W0131 00:20:13.656813 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:20:14.656925 128 model_lifecycle.cc:241] InflightStatus()
I0131 00:20:14.656953 128 server.cc:323] Timeout 29: Found 0 model versions that have in-flight inferences
I0131 00:20:14.656977 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32'
W0131 00:20:14.656994 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
...
I0131 00:20:14.893524 128 stub_launcher.cc:253] Starting Python backend stub:  exec /opt/tritonserver/backends/python/triton_python_backend_stub /opt/tritonserver/qa/L0_lifecycle/models/identity_fp32/1/model.py triton_python_backend_shm_region_2 1048576 1048576 128 /opt/tritonserver/backends/python 336 identity_fp32_0_0 DEFAULT
...
I0131 00:20:24.658975 128 model_lifecycle.cc:241] InflightStatus()
I0131 00:20:24.659005 128 server.cc:323] Timeout 19: Found 0 model versions that have in-flight inferences
I0131 00:20:24.659027 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32'
W0131 00:20:24.659089 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
...
I0131 00:20:25.046048 128 model_lifecycle.cc:692] OnLoadComplete() 'identity_fp32' version 1
I0131 00:20:25.046091 128 model_lifecycle.cc:730] OnLoadFinal() 'identity_fp32' for all version(s)
I0131 00:20:25.046101 128 model_lifecycle.cc:835] successfully loaded 'identity_fp32'
...
I0131 00:20:25.659243 128 server.cc:323] Timeout 18: Found 0 model versions that have in-flight inferences
I0131 00:20:25.659277 128 model_lifecycle.cc:390] AsyncUnload() 'identity_fp32'
I0131 00:20:25.659350 128 server.cc:338] All models are stopped, unloading models
I0131 00:20:25.659359 128 model_lifecycle.cc:190] LiveModelStates()
I0131 00:20:25.659371 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0131 00:20:25.659379 128 server.cc:347] Timeout 18: Found 1 live models and 0 in-flight non-inference requests
I0131 00:20:25.659387 128 server.cc:353] identity_fp32 v1: UNLOADING
I0131 00:20:25.659405 128 backend_model_instance.cc:795] Stopping backend thread for identity_fp32_0_0...
...
I0131 00:20:26.968432 128 python_be.cc:2342] TRITONBACKEND_ModelFinalize: delete model state
I0131 00:20:26.968516 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0131 00:20:26.968526 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0131 00:20:27.659742 128 model_lifecycle.cc:190] LiveModelStates()
I0131 00:20:27.659776 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0131 00:20:27.659785 128 server.cc:347] Timeout 16: Found 0 live models and 0 in-flight non-inference requests
...
I0131 00:20:27.759414 128 backend_manager.cc:138] unloading backend 'python'
I0131 00:20:27.759452 128 python_be.cc:2299] TRITONBACKEND_Finalize: Start
I0131 00:20:27.759617 128 python_be.cc:2304] TRITONBACKEND_Finalize: End

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove --log-verbose output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I0131 00:44:46.308855 760 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001
I0131 00:44:46.309011 760 http_server.cc:4623] Started HTTPService at 0.0.0.0:8000
I0131 00:44:46.349928 760 http_server.cc:315] Started Metrics Service at 0.0.0.0:8002
I0131 00:44:47.173588 760 model_lifecycle.cc:469] loading: identity_fp32:1
Signal (15) received.
I0131 00:44:47.201645 760 server.cc:307] Waiting for in-flight requests to complete.
I0131 00:44:47.201669 760 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
W0131 00:44:47.201707 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:48.201804 760 server.cc:323] Timeout 29: Found 0 model versions that have in-flight inferences
W0131 00:44:48.201874 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:48.437924 760 python_be.cc:2363] TRITONBACKEND_ModelInstanceInitialize: identity_fp32_0_0 (CPU device 0)
I0131 00:44:49.202414 760 server.cc:323] Timeout 28: Found 0 model versions that have in-flight inferences
W0131 00:44:49.202512 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:50.202611 760 server.cc:323] Timeout 27: Found 0 model versions that have in-flight inferences
W0131 00:44:50.202681 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:51.202778 760 server.cc:323] Timeout 26: Found 0 model versions that have in-flight inferences
W0131 00:44:51.202845 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:52.202940 760 server.cc:323] Timeout 25: Found 0 model versions that have in-flight inferences
W0131 00:44:52.203004 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:53.203571 760 server.cc:323] Timeout 24: Found 0 model versions that have in-flight inferences
W0131 00:44:53.203797 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:54.204453 760 server.cc:323] Timeout 23: Found 0 model versions that have in-flight inferences
W0131 00:44:54.204681 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:55.205354 760 server.cc:323] Timeout 22: Found 0 model versions that have in-flight inferences
W0131 00:44:55.205582 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:56.206224 760 server.cc:323] Timeout 21: Found 0 model versions that have in-flight inferences
W0131 00:44:56.206454 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:57.207007 760 server.cc:323] Timeout 20: Found 0 model versions that have in-flight inferences
W0131 00:44:57.207090 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:58.207329 760 server.cc:323] Timeout 19: Found 0 model versions that have in-flight inferences
W0131 00:44:58.207406 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:58.592469 760 model_lifecycle.cc:835] successfully loaded 'identity_fp32'
I0131 00:44:59.208003 760 server.cc:323] Timeout 18: Found 0 model versions that have in-flight inferences
I0131 00:44:59.208247 760 server.cc:338] All models are stopped, unloading models
I0131 00:44:59.208263 760 server.cc:347] Timeout 18: Found 1 live models and 0 in-flight non-inference requests
I0131 00:45:00.208897 760 server.cc:347] Timeout 17: Found 1 live models and 0 in-flight non-inference requests
I0131 00:45:00.530814 760 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0131 00:45:01.209074 760 server.cc:347] Timeout 16: Found 0 live models and 0 in-flight non-inference requests

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the wording of what's logged in a follow-up if needed, this is pertaining to a relatively non-standard case.

@kthui kthui merged commit 9468f5f into main Jan 31, 2024
1 check passed
@kthui kthui deleted the jacky-server-stop branch January 31, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants