Adding OpenTelemetry Batch Span Processor (#6842)

Adding OpenTelemetry Batch Span Processor --------- Co-authored-by: Theo Clark <theoclark101@gmail.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
triton-inference-server · Feb 1, 2024 · 8f98789 · 8f98789
1 parent f345bbb
commit 8f98789
Show file tree

Hide file tree

Showing 8 changed files with 493 additions and 72 deletions.
diff --git a/docs/user_guide/trace.md b/docs/user_guide/trace.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2019-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2019-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -456,6 +456,46 @@ flag as follows:
 $ tritonserver --trace-config mode=opentelemetry \
     --trace-config opentelemetry,url=<endpoint> ...
 ```
+
+Triton's OpenTelemetry trace mode uses
+[Batch Span Processor](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#batch-span-processor),
+which batches ended spans and sends them in bulk. Batching helps
+with data compression and reduces the number of outgoing connections
+required to transmit the data. This processor supports both size and
+time based batching. Size-based batching is controlled by 2 parameters:
+`bsp_max_export_batch_size` and `bsp_max_queue_size`, while time-based batching
+is controlled by `bsp_schedule_delay`. Collected spans will be exported when
+the batch size reaches `bsp_max_export_batch_size`, or delay since last export
+reaches `bsp_schedule_delay`, whatever comes first. Additionally, user should
+make sure that `bsp_max_export_batch_size` is always less than
+`bsp_max_queue_size`, otherwise the excessive spans will be dropped
+and trace data will be lost.
+
+Default parameters for the Batch Span Processor are provided in
+[`OpenTelemetry trace APIs settings`](#opentelemetry-trace-apis-settings).
+As a general recommendation, make sure that `bsp_max_queue_size` is large enough
+to hold all collected spans, and `bsp_schedule_delay` does not cause frequent
+exports, which will affect Triton Server's latency. A minimal Triton trace
+consists of 3 spans: top level span, model span, and compute span.
+
+* __Top level span__: The top-level span collects timestamps for when
+request was received by Triton, and when the response was sent. Any Triton
+trace contains only 1 top level span.
+* __Model span__: Model spans collect information, when request for
+this model was started, when it was placed in a queue, and when it was ended.
+A minimal Triton trace contains 1 model span.
+* __Compute span__: Compute spans record compute timestamps. A minimal
+Triton trace contains 1 compute span.
+
+The total amount of spans depends on the complexity of your model.
+A general rule is any base model - a single model that performs computations -
+produces 1 model span and one compute span. For ensembles, every composing
+model produces model and compute spans in addition to one model span for the
+ensemble. [BLS](#tracing-for-bls-models) models produce the same number of
+model and compute spans as the total amount of models involved in the BLS request,
+including the main BLS model.
+
+
 ### Differences in trace contents from Triton's trace [output](#json-trace-output)
 
 OpenTelemetry APIs produce [spans](https://opentelemetry.io/docs/concepts/observability-primer/#spans)
@@ -509,6 +549,46 @@ The following table shows available OpenTelemetry trace APIs settings for
       environment variable.
     </td>
     </tr>
+    <tr>
+    <td><a href="https://opentelemetry.io/docs/specs/otel/trace/sdk/#batching-processor">
+      Batch Span Processor</a>
+    </td>
+    <td></td><td></td>
+    </tr>
+    <tr>
+    <td><code>bsp_max_queue_size</code></td>
+    <td align="center">2048</td>
+    <td>
+      Maximum queue size. <br/>
+      This setting can also be specified through <br/>
+      <a href="https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#batch-span-processor">
+      OTEL_BSP_MAX_QUEUE_SIZE</a>
+      environment variable.
+    </td>
+    </tr>
+    <tr>
+    <td><code>bsp_schedule_delay</code></td>
+    <td align="center">5000</td>
+    <td>
+      Delay interval (in milliseconds) between two consecutive exports. <br/>
+      This setting can also be specified through <br/>
+      <a href="https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#batch-span-processor">
+      OTEL_BSP_SCHEDULE_DELAY</a>
+      environment variable.
+    </td>
+    </tr>
+    <tr>
+    <td><code>bsp_max_export_batch_size</code></td>
+    <td align="center">512</td>
+    <td>
+      Maximum batch size. Must be less than or equal to
+      <code>bsp_max_queue_size</code>.<br/>
+      This setting can also be specified through <br/>
+      <a href="https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#batch-span-processor">
+      OTEL_BSP_MAX_EXPORT_BATCH_SIZE</a>
+      environment variable.
+    </td>
+    </tr>
   </tbody>
 </table>
 

diff --git a/qa/L0_cmdline_trace/test.sh b/qa/L0_cmdline_trace/test.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-# Copyright 2019-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2019-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -25,6 +25,19 @@
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+# ============================= Helpers =======================================
+function assert_server_startup_failed() {
+  if [ "$SERVER_PID" != "0" ]; then
+      echo -e "\n***\n***Fail: Server start should have failed $SERVER\n***"
+      cat $SERVER_LOG
+      set -e
+      kill $SERVER_PID
+      wait $SERVER_PID
+      set +e
+      exit 1
+  fi
+}
+
 TRACE_SUMMARY=../common/trace_summary.py
 CLIENT_SCRIPT=trace_client.py
 
@@ -618,11 +631,92 @@ set -e
 kill $SERVER_PID
 wait $SERVER_PID
 
+set +e
+
+################################################################################
+# The following set of tests checks that tritonserver gracefully handles       #
+# bad OpenTelemetry BatchSpanProcessor parameters, provided through            #
+# environment variables, or tritonserver's options.                            #
+################################################################################
+export OTEL_BSP_MAX_QUEUE_SIZE="bad_value"
+
+SERVER_ARGS="--trace-config mode=opentelemetry --model-repository=$MODELSDIR"
+SERVER_LOG="./inference_server_trace_config_flag.log"
+run_server
+assert_server_startup_failed
+
+if [ `grep -c "Bad option: \"OTEL_BSP_MAX_QUEUE_SIZE\"" $SERVER_LOG` != "1" ]; then
+    cat $SERVER_LOG
+    echo -e "\n***\n*** Test Failed\n***"
+    RET=1
+fi
+
+unset OTEL_BSP_MAX_QUEUE_SIZE
+
+export OTEL_BSP_SCHEDULE_DELAY="bad_value"
+run_server
+assert_server_startup_failed
+
+if [ `grep -c "Bad option: \"OTEL_BSP_SCHEDULE_DELAY\"" $SERVER_LOG` != "1" ]; then
+    cat $SERVER_LOG
+    echo -e "\n***\n*** Test Failed\n***"
+    RET=1
+fi
+
+unset OTEL_BSP_SCHEDULE_DELAY
+
+export OTEL_BSP_MAX_EXPORT_BATCH_SIZE="bad_value"
+run_server
+assert_server_startup_failed
+
+if [ `grep -c "Bad option: \"OTEL_BSP_MAX_EXPORT_BATCH_SIZE\"" $SERVER_LOG` != "1" ]; then
+    cat $SERVER_LOG
+    echo -e "\n***\n*** Test Failed\n***"
+    RET=1
+fi
+
+unset OTEL_BSP_MAX_EXPORT_BATCH_SIZE
+
+SERVER_ARGS="--model-repository=$MODELSDIR --trace-config mode=opentelemetry \
+             --trace-config opentelemetry,bsp_max_queue_size=bad_value"
+SERVER_LOG="./inference_server_trace_config_flag.log"
+run_server
+assert_server_startup_failed
+
+if [ `grep -c "Bad option: \"--trace-config opentelemetry,bsp_max_queue_size\"" $SERVER_LOG` != "1" ]; then
+    cat $SERVER_LOG
+    echo -e "\n***\n*** Test Failed\n***"
+    RET=1
+fi
+
+SERVER_ARGS="--model-repository=$MODELSDIR --trace-config mode=opentelemetry \
+             --trace-config opentelemetry,bsp_schedule_delay=bad_value"
+SERVER_LOG="./inference_server_trace_config_flag.log"
+run_server
+assert_server_startup_failed
+
+if [ `grep -c "Bad option: \"--trace-config opentelemetry,bsp_schedule_delay\"" $SERVER_LOG` != "1" ]; then
+    cat $SERVER_LOG
+    echo -e "\n***\n*** Test Failed\n***"
+    RET=1
+fi
+
+SERVER_ARGS="--model-repository=$MODELSDIR --trace-config mode=opentelemetry \
+             --trace-config opentelemetry,bsp_max_export_batch_size=bad_value"
+SERVER_LOG="./inference_server_trace_config_flag.log"
+run_server
+assert_server_startup_failed
+
+if [ `grep -c "Bad option: \"--trace-config opentelemetry,bsp_max_export_batch_size\"" $SERVER_LOG` != "1" ]; then
+    cat $SERVER_LOG
+    echo -e "\n***\n*** Test Failed\n***"
+    RET=1
+fi
+
 if [ $RET -eq 0 ]; then
     echo -e "\n***\n*** Test Passed\n***"
 else
     echo -e "\n***\n*** Test FAILED\n***"
 fi
 
-
 exit $RET
diff --git a/qa/L0_trace/opentelemetry_unittest.py b/qa/L0_trace/opentelemetry_unittest.py
@@ -1,4 +1,4 @@
-# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2023-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -250,9 +250,9 @@ def _verify_contents(self, spans, expected_counts):
         span_names = []
         for span in spans:
             # Check that collected spans have proper events recorded
-            span_name = span[0]["name"]
+            span_name = span["name"]
             span_names.append(span_name)
-            span_events = span[0]["events"]
+            span_events = span["events"]
             event_names_only = [event["name"] for event in span_events]
             self._check_events(span_name, event_names_only)
 
@@ -283,13 +283,13 @@ def _verify_nesting(self, spans, expected_parent_span_dict):
         """
         seen_spans = {}
         for span in spans:
-            cur_span = span[0]["spanId"]
-            seen_spans[cur_span] = span[0]["name"]
+            cur_span = span["spanId"]
+            seen_spans[cur_span] = span["name"]
 
         parent_child_dict = {}
         for span in spans:
-            cur_parent = span[0]["parentSpanId"]
-            cur_span = span[0]["name"]
+            cur_parent = span["parentSpanId"]
+            cur_span = span["name"]
             if cur_parent in seen_spans.keys():
                 parent_name = seen_spans[cur_parent]
                 if parent_name not in parent_child_dict:
@@ -377,16 +377,21 @@ def _test_trace(
         """
         time.sleep(COLLECTOR_TIMEOUT)
         traces = self._parse_trace_log(self.filename)
-        self.assertEqual(len(traces), 1, "Unexpected number of traces collected")
+        expected_traces_number = 1
+        self.assertEqual(
+            len(traces),
+            expected_traces_number,
+            "Unexpected number of traces collected. Expected {}, but got {}".format(
+                expected_traces_number, len(traces)
+            ),
+        )
         self._test_resource_attributes(
             traces[0]["resourceSpans"][0]["resource"]["attributes"]
         )
 
-        parsed_spans = [
-            entry["scopeSpans"][0]["spans"] for entry in traces[0]["resourceSpans"]
-        ]
+        parsed_spans = traces[0]["resourceSpans"][0]["scopeSpans"][0]["spans"]
         root_span = [
-            entry[0] for entry in parsed_spans if entry[0]["name"] == "InferRequest"
+            entry for entry in parsed_spans if entry["name"] == "InferRequest"
         ][0]
         self.assertEqual(len(parsed_spans), expected_number_of_spans)