Description
What happened:
nfd-topology-updater
is used for report node's NUMA and Guaranteed
pods's cpu used. But it missed cpu for Guaranteed
pod which pod's have any no exclusiveCPU.
- Node
yj-kubevirtwork-001
kubelet config single-numa-node and cpu-manager
--cpu-manager-policy=static--topology-manager-policy=single-numa-node
I have two pods in node:
root@yj-kubevirtmaster-01 ~ # kubectl get po -owide nginx-deployment-68589dd8dc-kzvzh virt-launcher-kubevirttesttest1-p545b
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-68589dd8dc-kzvzh 1/1 Running 0 2d16h 10.106.66.199 yj-kubevirtwork-001 <none> <none>
virt-launcher-kubevirttesttest1-p545b 3/3 Running 0 4h20m 10.106.66.177 yj-kubevirtwork-001 <none> 1/1
In cpu-manager allocated cpu in node
cat /var/lib/kubelet/cpu_manager_state | jq
{
"policyName": "static",
"defaultCpuSet": "1,3,24-25,48-51,53,55,76-77,100-103",
"entries": {
"609526ad-0de0-4543-8188-48f872576fc9": {
"compute": "26-34,78-86"
},
"64dc405a-1a9c-4029-ba5f-778fe5948a10": {
"compute": "0,2,4-9,52,54,56-61"
},
"8310086e-bc8b-4ab3-ac6e-ccc1129fecec": {
"compute": "35-47,87-99"
},
"98f43841-5488-49d7-9f79-29fa68791316": {
"nginx": "14-15,66-67"
},
"b4fd6f12-9cf5-459e-8fdb-3058a2ae22a3": {
"compute": "10-13,16-23,62-65,68-75"
}
},
"checksum": 2485504922
}
But in noderesourcetopologies CR, it only report nginx's CPU set, not report kubevirt-launcher's CPU
kubectl get noderesourcetopologies yj-kubevirtwork-001 -oyaml
apiVersion: topology.node.k8s.io/v1alpha2
attributes:
- name: topologyManagerPolicy
value: single-numa-node
- name: topologyManagerScope
value: container
- name: nodeTopologyPodsFingerprint
value: pfp0v001828baea44880d152
kind: NodeResourceTopology
metadata:
creationTimestamp: "2024-12-05T13:13:40Z"
generation: 66528
name: yj-kubevirtwork-001
ownerReferences:
- apiVersion: v1
kind: Namespace
name: node-feature-discovery
uid: 4072a5ce-0634-447e-ab31-dd3f4bc7abac
resourceVersion: "27862374"
uid: dd26aaf4-3f38-4708-bb86-89305475a715
topologyPolicies:
- SingleNUMANodeContainerLevel
zones:
- costs:
- name: node-0
value: 21
- name: node-1
value: 10
name: node-1
resources:
- allocatable: "52"
available: "52"
capacity: "52"
name: cpu
- allocatable: "201847795712"
available: "157778243583"
capacity: "201847795712"
name: memory
type: Node
- costs:
- name: node-0
value: 10
- name: node-1
value: 21
name: node-0
resources:
- allocatable: "50"
available: "46"
capacity: "52"
name: cpu
- allocatable: "183845359616"
available: "130691427327"
capacity: "201130086400"
name: memory
type: Node
In nfd-topology-updater
logs:
I1213 06:06:41.791096 1 nfd-topology-updater.go:235] "received updated pod resources" podResources=<
- Containers:
- Name: nginx
Resources:
- Data:
- "66"
- "67"
- "14"
- "15"
Name: cpu
NumaNodeIds: null
- Data:
- "8589934592"
Name: memory
NumaNodeIds:
- 0
Name: nginx-deployment-68589dd8dc-kzvzh
Namespace: default
- Containers:
- Name: compute
Resources:
- Data:
- vhost-net172
Name: devices.kubevirt.io/vhost-net
NumaNodeIds: null
- Data:
- kvm760
Name: devices.kubevirt.io/kvm
NumaNodeIds: null
- Data:
- tun96
Name: devices.kubevirt.io/tun
NumaNodeIds: null
- Data:
- "9141485568"
Name: memory
NumaNodeIds:
- 0
- Name: volumecontainerdisk
Resources:
- Data:
- "40000000"
Name: memory
NumaNodeIds:
- 0
- Name: guest-console-log
Resources:
- Data:
- "60000000"
Name: memory
NumaNodeIds:
- 0
Name: virt-launcher-kubevirttesttest2-8dnfj
Namespace: default
>
nfd-topology-updater
not got kubevirt-launcher pod's compute container's CPU.
What you expected to happen:
nfd-topology-updater
can report all pod container's exclusiveCPU.
How to reproduce it (as minimally and precisely as possible):
For example, create a pod Pod-a
have 2 containers:
containers:
- name: ctr-a
resources:
limits:
cpu: 5
requests:
cpu: 5
- name: ctr-b
resources:
limits:
cpu: 500m
requests:
cpu: 500m
pod is still a Guaranteed pod , and allocate exclusiveCPU by kubelet's cpu-manager, but nfd-topology-updater
will not report it's CPU in topology CR.
Anything else we need to know?:
I thought this bug is cause by hasExclusiveCPUs
func:
node-feature-discovery/pkg/resourcemonitor/podresourcesscanner.go
Lines 78 to 107 in 1416072
if pod has any container not request cpu as integer, it will return false.
node-feature-discovery/pkg/resourcemonitor/podresourcesscanner.go
Lines 168 to 183 in 1416072
It cause nfd-topology-updater
skip this pod's all containers loop for report CPU.
Environment:
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others: