[GPUM:] updated onboarding documentation for deployment with dd-operator (#20407)

val06 · mbertrone · urseberry · web-flow · commit 97ebe8eb5939 · 2025-05-30T17:40:31.000Z
* updated gpum onboarding documentation when using dd-operator for k8s deployments

* Update gpu/README.md

Co-authored-by: Matteo Bertrone &lt;m.bertrone@gmail.com&gt;

* fixed indentation

* CR fixes

Co-authored-by: Ursula Chen &lt;58821586+urseberry@users.noreply.github.com&gt;

* added extra config flags to helm deployment section in the guide

---------

Co-authored-by: Matteo Bertrone &lt;m.bertrone@gmail.com&gt;
Co-authored-by: Ursula Chen &lt;58821586+urseberry@users.noreply.github.com&gt;
diff --git a/gpu/README.md b/gpu/README.md
@@ -108,6 +108,8 @@ For Helm configurations where all the nodes have GPUs, you can set up the Datado
 
 ```yaml
 datadog:
+  enable_nvml_detection: true
+  collect_gpu_tags: true
   gpuMonitoring:
     enabled: true
 ```
@@ -142,6 +144,8 @@ Additionally, if you need to select nodes based on the presence of a label key,
 # GPU-specific values-gpu.yaml (for GPU nodes)
 datadog:
   kubeStateMetricsEnabled: false # Disabled as we're joining an existing Cluster Agent
+  enable_nvml_detection: true
+  collect_gpu_tags: true
   gpuMonitoring:
     enabled: true
 
@@ -174,6 +178,8 @@ helm install -f values.yaml -f values-gpu.yaml datadog-gpu datadog
 
 #### Datadog Operator
 
+_**Minimum required operator version: 1.14**_
+
 To enable the GPU feature in clusters where all the nodes have GPUs, set the `features.gpu.enabled` parameter in the DatadogAgent manifest:
 
 ```yaml
@@ -185,6 +191,18 @@ spec:
   features:
     gpu:
       enabled: true
+  # for operator versions 1.14.x and 1.15.x  add this section
+  override:
+    nodeAgent:
+      containers:
+        agent:
+          env:
+            # add this env var, if using operator version 1.14.x
+            - name: DD_ENABLE_NVML_DETECTION
+              value: "true" 
+            # add this env var, if using operator versions 1.14.x or 1.15.x
+            - name: DD_COLLECT_GPU_TAGS
+              value: "true" 
 ```
 
 For **mixed environments**, use the [DatadogAgentProfiles feature](https://github.com/DataDog/datadog-operator/blob/main/docs/datadog_agent_profiles.md) of the operator, which allows different configurations to be deployed for different nodes. In this case, it is not necessary to modify the DatadogAgent manifest. Instead, create a profile that enables the configuration on GPU nodes only:
@@ -210,6 +228,14 @@ spec:
             env:
               - name: DD_GPU_MONITORING_ENABLED
                 value: "true"
+          # add this env var, if using operator version 1.14.x      
+          agent:
+            env:
+              - name: DD_ENABLE_NVML_DETECTION
+                value: "true" 
+              # add this env var, if using operator versions 1.14.x or 1.15.x
+              - name: DD_COLLECT_GPU_TAGS
+                value: "true"
 ```
 
 <!-- xxz tab xxx -->