Server stop will wait until all loading models to complete before unload all models #324

kthui · 2024-01-26T02:31:10Z

Related PR: triton-inference-server/server#6837

When the server is stopping and trying to unload all models, the unload all models will wait until all models are no longer in transition (i.e. loading/unloading) before starting the unload.

…ading

src/model_repository_manager/model_repository_manager.cc

rmccorm4 · 2024-01-29T23:39:29Z

src/model_repository_manager/model_repository_manager.cc

+  // Get a set of all models, and make sure non of them are loading/unloading.
+  std::set<ModelIdentifier> all_models;
+  bool all_models_locked = false;
+  while (!all_models_locked) {


Discussed possible simplification of using existing functions like LoadUnloadModels etc to make sure global state is updated after this function completes. Didn't want to leave global states in a weird place after unloading everything in the event this function is used for other purposes other than only shutdown, or if extra explicit load/unload calls slip in after this function - having correct global state would reduce possible bugs/errors.

rmccorm4 · 2024-01-30T20:40:31Z

src/model_repository_manager/model_repository_manager.cc

+    RETURN_IF_ERROR(LoadUnloadModels(
+        models, ActionType::UNLOAD, true /* unload_dependents */, &polled,
+        &no_parallel_conflict));
+  } while (!no_parallel_conflict);


Is there a case where this while loop can never exit? Also does it now cause issues possibly disregarding the "30 second timeout"?

Is there a case where this while loop can never exit?

If a model is stuck in loading/unloading or there is a bug somewhere else in model_repository_manager, then this loop will never exit, but both are not expected.

Also does it now cause issues possibly disregarding the "30 second timeout"?

If a model is loading, then the loading thread will stop the timeout from counting forward until the unload can begin.

Also does it now cause issues possibly disregarding the "30 second timeout"?

If a model is loading, then the loading thread will stop the timeout from counting forward until the unload can begin.

Respect server shutdown timeout when waiting for load

So now it will be something like this?

30 second timeout ... Error: found conflict model: abc (still loading) 29 second timeout ... Error: found conflict model: abc (still loading) 28 second timeout ... Error: found conflict model: abc (still loading) 27 second timeout ... Error: found conflict model: abc (still loading) ### abc finished loading Unloading abc... 26 second timeout ... Live models found: 1 ... (abc unloading) 25 second timeout ... Live models found: 1 ... (abc unloading) ... Done

Can you share an example output of waiting for model to finish loading before unloading, if you have one?

Sure.

Signal (15) received. I0131 00:20:13.656730 128 server.cc:307] Waiting for in-flight requests to complete. I0131 00:20:13.656747 128 model_lifecycle.cc:223] StopAllModels() I0131 00:20:13.656757 128 model_lifecycle.cc:241] InflightStatus() I0131 00:20:13.656764 128 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences I0131 00:20:13.656794 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32' W0131 00:20:13.656813 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:20:14.656925 128 model_lifecycle.cc:241] InflightStatus() I0131 00:20:14.656953 128 server.cc:323] Timeout 29: Found 0 model versions that have in-flight inferences I0131 00:20:14.656977 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32' W0131 00:20:14.656994 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading ... I0131 00:20:14.893524 128 stub_launcher.cc:253] Starting Python backend stub: exec /opt/tritonserver/backends/python/triton_python_backend_stub /opt/tritonserver/qa/L0_lifecycle/models/identity_fp32/1/model.py triton_python_backend_shm_region_2 1048576 1048576 128 /opt/tritonserver/backends/python 336 identity_fp32_0_0 DEFAULT ... I0131 00:20:24.658975 128 model_lifecycle.cc:241] InflightStatus() I0131 00:20:24.659005 128 server.cc:323] Timeout 19: Found 0 model versions that have in-flight inferences I0131 00:20:24.659027 128 model_repository_manager.cc:791] Load/Unload conflict 'identity_fp32' W0131 00:20:24.659089 128 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading ... I0131 00:20:25.046048 128 model_lifecycle.cc:692] OnLoadComplete() 'identity_fp32' version 1 I0131 00:20:25.046091 128 model_lifecycle.cc:730] OnLoadFinal() 'identity_fp32' for all version(s) I0131 00:20:25.046101 128 model_lifecycle.cc:835] successfully loaded 'identity_fp32' ... I0131 00:20:25.659243 128 server.cc:323] Timeout 18: Found 0 model versions that have in-flight inferences I0131 00:20:25.659277 128 model_lifecycle.cc:390] AsyncUnload() 'identity_fp32' I0131 00:20:25.659350 128 server.cc:338] All models are stopped, unloading models I0131 00:20:25.659359 128 model_lifecycle.cc:190] LiveModelStates() I0131 00:20:25.659371 128 model_lifecycle.cc:265] BackgroundModelsSize() I0131 00:20:25.659379 128 server.cc:347] Timeout 18: Found 1 live models and 0 in-flight non-inference requests I0131 00:20:25.659387 128 server.cc:353] identity_fp32 v1: UNLOADING I0131 00:20:25.659405 128 backend_model_instance.cc:795] Stopping backend thread for identity_fp32_0_0... ... I0131 00:20:26.968432 128 python_be.cc:2342] TRITONBACKEND_ModelFinalize: delete model state I0131 00:20:26.968516 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1 I0131 00:20:26.968526 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1 I0131 00:20:27.659742 128 model_lifecycle.cc:190] LiveModelStates() I0131 00:20:27.659776 128 model_lifecycle.cc:265] BackgroundModelsSize() I0131 00:20:27.659785 128 server.cc:347] Timeout 16: Found 0 live models and 0 in-flight non-inference requests ... I0131 00:20:27.759414 128 backend_manager.cc:138] unloading backend 'python' I0131 00:20:27.759452 128 python_be.cc:2299] TRITONBACKEND_Finalize: Start I0131 00:20:27.759617 128 python_be.cc:2304] TRITONBACKEND_Finalize: End

Can you remove --log-verbose output?

Sure.

I0131 00:44:46.308855 760 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001 I0131 00:44:46.309011 760 http_server.cc:4623] Started HTTPService at 0.0.0.0:8000 I0131 00:44:46.349928 760 http_server.cc:315] Started Metrics Service at 0.0.0.0:8002 I0131 00:44:47.173588 760 model_lifecycle.cc:469] loading: identity_fp32:1 Signal (15) received. I0131 00:44:47.201645 760 server.cc:307] Waiting for in-flight requests to complete. I0131 00:44:47.201669 760 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences W0131 00:44:47.201707 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:48.201804 760 server.cc:323] Timeout 29: Found 0 model versions that have in-flight inferences W0131 00:44:48.201874 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:48.437924 760 python_be.cc:2363] TRITONBACKEND_ModelInstanceInitialize: identity_fp32_0_0 (CPU device 0) I0131 00:44:49.202414 760 server.cc:323] Timeout 28: Found 0 model versions that have in-flight inferences W0131 00:44:49.202512 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:50.202611 760 server.cc:323] Timeout 27: Found 0 model versions that have in-flight inferences W0131 00:44:50.202681 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:51.202778 760 server.cc:323] Timeout 26: Found 0 model versions that have in-flight inferences W0131 00:44:51.202845 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:52.202940 760 server.cc:323] Timeout 25: Found 0 model versions that have in-flight inferences W0131 00:44:52.203004 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:53.203571 760 server.cc:323] Timeout 24: Found 0 model versions that have in-flight inferences W0131 00:44:53.203797 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:54.204453 760 server.cc:323] Timeout 23: Found 0 model versions that have in-flight inferences W0131 00:44:54.204681 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:55.205354 760 server.cc:323] Timeout 22: Found 0 model versions that have in-flight inferences W0131 00:44:55.205582 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:56.206224 760 server.cc:323] Timeout 21: Found 0 model versions that have in-flight inferences W0131 00:44:56.206454 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:57.207007 760 server.cc:323] Timeout 20: Found 0 model versions that have in-flight inferences W0131 00:44:57.207090 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:58.207329 760 server.cc:323] Timeout 19: Found 0 model versions that have in-flight inferences W0131 00:44:58.207406 760 server.cc:335] a related model 'identity_fp32' to a load/unload request is currently loading or unloading I0131 00:44:58.592469 760 model_lifecycle.cc:835] successfully loaded 'identity_fp32' I0131 00:44:59.208003 760 server.cc:323] Timeout 18: Found 0 model versions that have in-flight inferences I0131 00:44:59.208247 760 server.cc:338] All models are stopped, unloading models I0131 00:44:59.208263 760 server.cc:347] Timeout 18: Found 1 live models and 0 in-flight non-inference requests I0131 00:45:00.208897 760 server.cc:347] Timeout 17: Found 1 live models and 0 in-flight non-inference requests I0131 00:45:00.530814 760 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1 I0131 00:45:01.209074 760 server.cc:347] Timeout 16: Found 0 live models and 0 in-flight non-inference requests

rmccorm4

I think we can improve the wording of what's logged in a follow-up if needed, this is pertaining to a relatively non-standard case.

kthui mentioned this pull request Jan 26, 2024

Add test for shutdown while loading model triton-inference-server/server#6837

Merged

Unload all models must wait until all models complete loading or unlo…

f2bead5

…ading

kthui force-pushed the jacky-server-stop branch from 90f3f53 to f2bead5 Compare January 27, 2024 03:00

kthui requested review from rmccorm4 and GuanLuo January 27, 2024 03:02

kthui marked this pull request as ready for review January 27, 2024 03:02

rmccorm4 reviewed Jan 29, 2024

View reviewed changes

src/model_repository_manager/model_repository_manager.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Jan 29, 2024

View reviewed changes

kthui added 2 commits January 29, 2024 17:32

Use LoadUnloadModels() in UnloadAllModels()

4b41e8c

Erase removed nodes only after affected nodes is completed

838dcb3

rmccorm4 reviewed Jan 30, 2024

View reviewed changes

Respect server shutdown timeout when waiting for load

fadd6cf

rmccorm4 approved these changes Jan 31, 2024

View reviewed changes

kthui merged commit 9468f5f into main Jan 31, 2024
1 check passed

kthui deleted the jacky-server-stop branch January 31, 2024 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server stop will wait until all loading models to complete before unload all models #324

Server stop will wait until all loading models to complete before unload all models #324

kthui commented Jan 26, 2024

rmccorm4 Jan 29, 2024

rmccorm4 Jan 30, 2024

kthui Jan 30, 2024

kthui Jan 30, 2024

rmccorm4 Jan 30, 2024

kthui Jan 31, 2024

rmccorm4 Jan 31, 2024

kthui Jan 31, 2024

rmccorm4 left a comment

Server stop will wait until all loading models to complete before unload all models #324

Server stop will wait until all loading models to complete before unload all models #324

Conversation

kthui commented Jan 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmccorm4 left a comment

Choose a reason for hiding this comment