-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server stop will wait until all loading models to complete before unload all models #324
Conversation
90f3f53
to
f2bead5
Compare
// Get a set of all models, and make sure non of them are loading/unloading. | ||
std::set<ModelIdentifier> all_models; | ||
bool all_models_locked = false; | ||
while (!all_models_locked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed possible simplification of using existing functions like LoadUnloadModels etc to make sure global state is updated after this function completes. Didn't want to leave global states in a weird place after unloading everything in the event this function is used for other purposes other than only shutdown, or if extra explicit load/unload calls slip in after this function - having correct global state would reduce possible bugs/errors.
RETURN_IF_ERROR(LoadUnloadModels( | ||
models, ActionType::UNLOAD, true /* unload_dependents */, &polled, | ||
&no_parallel_conflict)); | ||
} while (!no_parallel_conflict); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where this while loop can never exit? Also does it now cause issues possibly disregarding the "30 second timeout"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where this while loop can never exit?
If a model is stuck in loading/unloading or there is a bug somewhere else in model_repository_manager, then this loop will never exit, but both are not expected.
Also does it now cause issues possibly disregarding the "30 second timeout"?
If a model is loading, then the loading thread will stop the timeout from counting forward until the unload can begin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also does it now cause issues possibly disregarding the "30 second timeout"?
If a model is loading, then the loading thread will stop the timeout from counting forward until the unload can begin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now it will be something like this?
30 second timeout ...
Error: found conflict model: abc (still loading)
29 second timeout ...
Error: found conflict model: abc (still loading)
28 second timeout ...
Error: found conflict model: abc (still loading)
27 second timeout ...
Error: found conflict model: abc (still loading)
### abc finished loading
Unloading abc...
26 second timeout ...
Live models found: 1 ... (abc unloading)
25 second timeout ...
Live models found: 1 ... (abc unloading)
...
Done
Can you share an example output of waiting for model to finish loading before unloading, if you have one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Signal (15) received.
I0131 00:20:13.656730 128 server.cc:307] Waiting for in-flight requests to complete.
I0131 00:20:13.656747 128 model_lifecycle.cc:223] StopAllModels()
I0131 00:20:13.656757 128 model_lifecycle.cc:241] InflightStatus()
I0131 00:20:13.656764 128 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0131 00:20:13.656794 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32'
W0131 00:20:13.656813 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:20:14.656925 128 model_lifecycle.cc:241] InflightStatus()
I0131 00:20:14.656953 128 server.cc:323] Timeout 29: Found 0 model versions that have in-flight inferences
I0131 00:20:14.656977 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32'
W0131 00:20:14.656994 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
...
I0131 00:20:14.893524 128 stub_launcher.cc:253] Starting Python backend stub: exec /opt/tritonserver/backends/python/triton_python_backend_stub /opt/tritonserver/qa/L0_lifecycle/models/identity_fp32/1/model.py triton_python_backend_shm_region_2 1048576 1048576 128 /opt/tritonserver/backends/python 336 identity_fp32_0_0 DEFAULT
...
I0131 00:20:24.658975 128 model_lifecycle.cc:241] InflightStatus()
I0131 00:20:24.659005 128 server.cc:323] Timeout 19: Found 0 model versions that have in-flight inferences
I0131 00:20:24.659027 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32'
W0131 00:20:24.659089 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
...
I0131 00:20:25.046048 128 model_lifecycle.cc:692] OnLoadComplete() 'identity_fp32' version 1
I0131 00:20:25.046091 128 model_lifecycle.cc:730] OnLoadFinal() 'identity_fp32' for all version(s)
I0131 00:20:25.046101 128 model_lifecycle.cc:835] successfully loaded 'identity_fp32'
...
I0131 00:20:25.659243 128 server.cc:323] Timeout 18: Found 0 model versions that have in-flight inferences
I0131 00:20:25.659277 128 model_lifecycle.cc:390] AsyncUnload() 'identity_fp32'
I0131 00:20:25.659350 128 server.cc:338] All models are stopped, unloading models
I0131 00:20:25.659359 128 model_lifecycle.cc:190] LiveModelStates()
I0131 00:20:25.659371 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0131 00:20:25.659379 128 server.cc:347] Timeout 18: Found 1 live models and 0 in-flight non-inference requests
I0131 00:20:25.659387 128 server.cc:353] identity_fp32 v1: UNLOADING
I0131 00:20:25.659405 128 backend_model_instance.cc:795] Stopping backend thread for identity_fp32_0_0...
...
I0131 00:20:26.968432 128 python_be.cc:2342] TRITONBACKEND_ModelFinalize: delete model state
I0131 00:20:26.968516 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0131 00:20:26.968526 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0131 00:20:27.659742 128 model_lifecycle.cc:190] LiveModelStates()
I0131 00:20:27.659776 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0131 00:20:27.659785 128 server.cc:347] Timeout 16: Found 0 live models and 0 in-flight non-inference requests
...
I0131 00:20:27.759414 128 backend_manager.cc:138] unloading backend 'python'
I0131 00:20:27.759452 128 python_be.cc:2299] TRITONBACKEND_Finalize: Start
I0131 00:20:27.759617 128 python_be.cc:2304] TRITONBACKEND_Finalize: End
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove --log-verbose
output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
I0131 00:44:46.308855 760 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001
I0131 00:44:46.309011 760 http_server.cc:4623] Started HTTPService at 0.0.0.0:8000
I0131 00:44:46.349928 760 http_server.cc:315] Started Metrics Service at 0.0.0.0:8002
I0131 00:44:47.173588 760 model_lifecycle.cc:469] loading: identity_fp32:1
Signal (15) received.
I0131 00:44:47.201645 760 server.cc:307] Waiting for in-flight requests to complete.
I0131 00:44:47.201669 760 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
W0131 00:44:47.201707 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:48.201804 760 server.cc:323] Timeout 29: Found 0 model versions that have in-flight inferences
W0131 00:44:48.201874 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:48.437924 760 python_be.cc:2363] TRITONBACKEND_ModelInstanceInitialize: identity_fp32_0_0 (CPU device 0)
I0131 00:44:49.202414 760 server.cc:323] Timeout 28: Found 0 model versions that have in-flight inferences
W0131 00:44:49.202512 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:50.202611 760 server.cc:323] Timeout 27: Found 0 model versions that have in-flight inferences
W0131 00:44:50.202681 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:51.202778 760 server.cc:323] Timeout 26: Found 0 model versions that have in-flight inferences
W0131 00:44:51.202845 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:52.202940 760 server.cc:323] Timeout 25: Found 0 model versions that have in-flight inferences
W0131 00:44:52.203004 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:53.203571 760 server.cc:323] Timeout 24: Found 0 model versions that have in-flight inferences
W0131 00:44:53.203797 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:54.204453 760 server.cc:323] Timeout 23: Found 0 model versions that have in-flight inferences
W0131 00:44:54.204681 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:55.205354 760 server.cc:323] Timeout 22: Found 0 model versions that have in-flight inferences
W0131 00:44:55.205582 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:56.206224 760 server.cc:323] Timeout 21: Found 0 model versions that have in-flight inferences
W0131 00:44:56.206454 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:57.207007 760 server.cc:323] Timeout 20: Found 0 model versions that have in-flight inferences
W0131 00:44:57.207090 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:58.207329 760 server.cc:323] Timeout 19: Found 0 model versions that have in-flight inferences
W0131 00:44:58.207406 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading
I0131 00:44:58.592469 760 model_lifecycle.cc:835] successfully loaded 'identity_fp32'
I0131 00:44:59.208003 760 server.cc:323] Timeout 18: Found 0 model versions that have in-flight inferences
I0131 00:44:59.208247 760 server.cc:338] All models are stopped, unloading models
I0131 00:44:59.208263 760 server.cc:347] Timeout 18: Found 1 live models and 0 in-flight non-inference requests
I0131 00:45:00.208897 760 server.cc:347] Timeout 17: Found 1 live models and 0 in-flight non-inference requests
I0131 00:45:00.530814 760 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0131 00:45:01.209074 760 server.cc:347] Timeout 16: Found 0 live models and 0 in-flight non-inference requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can improve the wording of what's logged in a follow-up if needed, this is pertaining to a relatively non-standard case.
Related PR: triton-inference-server/server#6837
When the server is stopping and trying to unload all models, the unload all models will wait until all models are no longer in transition (i.e. loading/unloading) before starting the unload.