Add retry on model loading. Expose option to set model retry count #308

GuanLuo · 2024-01-03T19:43:11Z

No description provided.

rmccorm4 · 2024-01-03T19:49:43Z

python/tritonserver/_c/tritonserver_pybind.cc

@@ -1282,6 +1282,12 @@ class PyServerOptions : public PyWrapper<struct TRITONSERVER_ServerOptions> {
        triton_object_, thread_count));
  }

+  void SetModelLoadRetryCount(unsigned int retry_count)


Unrelated, but do you think we could have some kind of test that asserts the python bindings fully cover the C API?

I'm thinking long term, folks may not always remember to add the binding equivalent for any API changes, and it might be good to automate that check.

nnshah1 · 2024-01-03T20:24:55Z

src/model_repository_manager/model_lifecycle.cc

+        CreateModel(model_id, version, model_info, is_config_provided);
+        // Model state will be changed to NOT loading if failed to load,
+        // so the model is loaded if state is LOADING.
+        if (model_info->state_ == ModelReadyState::LOADING) {


question - is there a pause / is there one needed between retry attempts?

Why would one be needed?

typically you try again after some time in order to let a possible transient error resolve - instead of retrying immediately - but I'm not familiar with the underlying isssue reported here - so mainly wondering if it makes sense here or not

Tabrizian · 2024-01-03T20:44:31Z

include/triton/core/tritonserver.h

@@ -1978,6 +1978,15 @@ TRITONSERVER_DECLSPEC struct TRITONSERVER_Error*
 TRITONSERVER_ServerOptionsSetModelLoadThreadCount(
    struct TRITONSERVER_ServerOptions* options, unsigned int thread_count);

+/// Set the number of retry to load a model in a server options.


would be good to mention what the default is.

kthui · 2024-01-03T22:39:59Z

src/model_repository_manager/model_lifecycle.h

+  const double min_compute_capability;
  // The backend configuration settings specified on the command-line
-  const triton::common::BackendCmdlineConfigMap& backend_cmdline_config_map_;
+  const triton::common::BackendCmdlineConfigMap& backend_cmdline_config_map;
  // The host policy setting used when loading models.
-  const triton::common::HostPolicyCmdlineConfigMap& host_policy_map_;
+  const triton::common::HostPolicyCmdlineConfigMap& host_policy_map;
  // Number of the threads to use for concurrently loading models
-  const unsigned int model_load_thread_count_;
+  const unsigned int model_load_thread_count;


Nice refactoring!

) * Group model repository files * Expose option to set model retry count

…count (#308)" This reverts commit cb59705.

GuanLuo requested review from Tabrizian and kthui January 3, 2024 19:43

rmccorm4 reviewed Jan 3, 2024

View reviewed changes

nnshah1 reviewed Jan 3, 2024

View reviewed changes

Tabrizian approved these changes Jan 3, 2024

View reviewed changes

kthui reviewed Jan 3, 2024

View reviewed changes

kthui approved these changes Jan 3, 2024

View reviewed changes

GuanLuo added 2 commits January 4, 2024 12:10

Group model repository files

2e00969

Expose option to set model retry count

6fffd27

GuanLuo force-pushed the gluo-reload branch from c9ffe31 to 6fffd27 Compare January 4, 2024 20:10

GuanLuo merged commit 3b97b2f into main Jan 5, 2024
1 check passed

GuanLuo deleted the gluo-reload branch January 5, 2024 22:47

nnshah1 pushed a commit that referenced this pull request Jan 11, 2024

Add retry on model loading. Expose option to set model retry count (#308

cb59705

) * Group model repository files * Expose option to set model retry count

nnshah1 added a commit that referenced this pull request Jan 11, 2024

Revert "Add retry on model loading. Expose option to set model retry …

b930a51

…count (#308)" This reverts commit cb59705.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retry on model loading. Expose option to set model retry count #308

Add retry on model loading. Expose option to set model retry count #308

GuanLuo commented Jan 3, 2024

rmccorm4 Jan 3, 2024

nnshah1 Jan 3, 2024

GuanLuo Jan 3, 2024

nnshah1 Jan 3, 2024 •

edited

Loading

Tabrizian Jan 3, 2024

kthui Jan 3, 2024

Add retry on model loading. Expose option to set model retry count #308

Add retry on model loading. Expose option to set model retry count #308

Conversation

GuanLuo commented Jan 3, 2024

rmccorm4 Jan 3, 2024

Choose a reason for hiding this comment

nnshah1 Jan 3, 2024

Choose a reason for hiding this comment

GuanLuo Jan 3, 2024

Choose a reason for hiding this comment

nnshah1 Jan 3, 2024 • edited Loading

Choose a reason for hiding this comment

Tabrizian Jan 3, 2024

Choose a reason for hiding this comment

kthui Jan 3, 2024

Choose a reason for hiding this comment

nnshah1 Jan 3, 2024 •

edited

Loading