triton full_gpu_inference_pipeline demo not working #1144

M1n9X · 2023-01-06T09:34:51Z

M1n9X
Jan 6, 2023

Hi, all,

i tried the demo under ”tutorial/full_gpu_inference_pipeline“ without luck.

1. errors occurred

By following the official steps, i have successfully started the Triton server and Triton client, but in the Benchmark, when i ran

perf_analyzer -m spleen_seg -u localhost:18100 --input-data zero --shape "INPUT0":512,512,114 --shared-memory system

The following errors occurred:

# server side
I0106 14:36:26.742519 1279 grpc_server.cc:4190] Started GRPCInferenceService at 0.0.0.0:8001
I0106 14:36:26.743364 1279 http_server.cc:2857] Started HTTPService at 0.0.0.0:8000
I0106 14:36:26.785051 1279 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
2023-01-06 14:36:43,063 - the shape of the input tensor is: torch.Size([1, 512, 512, 114])
2023-01-06 14:36:44,825 - the shape of the transformed tensor is: torch.Size([1, 224, 224, 224])
2023-01-06 14:36:44,826 - the shape of the unsqueezed transformed tensor is: torch.Size([1, 1, 224, 224, 224])
E0106 14:36:46.006760 1279 python.cc:1970] Stub process is unhealthy and it will be restarted.

# client side
*** Measurement Settings ***
  Batch size: 1
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Failed to process the request(s) for model instance 'spleen_seg_0', message: Stub process is not healthy.

2. tried methods

Later i also tried to changes some parameters, e.g. w/o shared-memory, changing shm-size from 1g to 16g, installing monai env inside server instead of using conda-pack, changing docker image version, etc, which however, all failed.

When i tried tritonserver 22.12, the following errors occurred,

# server side
I0106 14:47:40.332535 94 grpc_server.cc:4819] Started GRPCInferenceService at 0.0.0.0:8001
I0106 14:47:40.332862 94 http_server.cc:3477] Started HTTPService at 0.0.0.0:8000
I0106 14:47:40.373862 94 http_server.cc:184] Started Metrics Service at 0.0.0.0:8002
2023-01-06 14:49:07,524 - the shape of the input tensor is: torch.Size([1, 512, 512, 114])
2023-01-06 14:49:09,501 - the shape of the transformed tensor is: torch.Size([1, 224, 224, 224])
2023-01-06 14:49:09,501 - the shape of the unsqueezed transformed tensor is: torch.Size([1, 1, 224, 224, 224])

# client side
*** Measurement Settings ***
*** Measurement Settings ***
  Batch size: 1
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Failed to process the request(s) for model instance 'spleen_seg_0', message: TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.

At:
  /triton_monai/spleen_seg/1/model.py(131): execute

3. some assumptions

from above errors, it can be inferred, that the following messages hit the core issue:

TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order

It's related to the following code in the mentioned model repository(need to be downloaded), "/triton_monai/spleen_seg/1/model.py(131): execute"

# get the input by name (as configured in config.pbtxt)
input_triton_tensor = pb_utils.get_input_tensor_by_name(request, "INPUT0")
input_torch_tensor = from_dlpack(input_triton_tensor.to_dlpack())
logger.info(f"the shape of the input tensor is: {input_torch_tensor.shape}")
transform_output = self.pre_transforms(input_torch_tensor[0])
logger.info(f"the shape of the transformed tensor is: {transform_output.shape}")
transform_output_batched = transform_output.unsqueeze(0)
logger.info(f"the shape of the unsqueezed transformed tensor is: {transform_output_batched.shape}")
# if(transform_output_batched.is_cuda):
# print("the transformed pytorch tensor is on GPU")

# print(transform_output.shape)
transform_tensor = pb_utils.Tensor.from_dlpack("INPUT__0", to_dlpack(transform_output_batched))

Apparently, the last line of code failed to run, but i don't know how to modify it.

Any suggestions or update to the "model.py" script?

Answered by M1n9X

Jan 10, 2023

Pls refer to #1150

View full answer

M1n9X · 2023-01-10T07:05:58Z

M1n9X
Jan 10, 2023
Author

Pls refer to #1150

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

triton full_gpu_inference_pipeline demo not working #1144

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

triton full_gpu_inference_pipeline demo not working #1144

Uh oh!

M1n9X Jan 6, 2023

1. errors occurred

2. tried methods

3. some assumptions

Replies: 1 comment

Uh oh!

M1n9X Jan 10, 2023 Author

M1n9X
Jan 6, 2023

M1n9X
Jan 10, 2023
Author