Skip to content

Panic in OpenStackMachineReconciler if OpenStackCluster.Status.Network is nil (Hosted Control Plane scenario) #2380

Open
@bnallapeta

Description

@bnallapeta

/kind bug

What steps did you take and what happened:

In a “hosted control plane” setup (where the control plane runs outside of OpenStack, and only worker nodes are provisioned in OpenStack), OpenStackCluster.Status.Network can remain nil. Currently, the CAPO code in OpenStackMachineReconciler.getOrCreateMachineServer() assumes openStackCluster.Status.Network is always non-nil. This leads to a nil pointer dereference (panic) when calling:

machineServerSpec := openStackMachineSpecToOpenStackServerSpec(
    &openStackMachine.Spec,
    identityRef,
    compute.InstanceTags(&openStackMachine.Spec, openStackCluster),
    failureDomain,
    userDataRef,
    getManagedSecurityGroup(openStackCluster, machine),
    openStackCluster.Status.Network.ID,  // <- panic if .Network is nil
)

The controller then crashes, making it impossible to provision worker nodes.

  • In HPC scenarios, there is no control-plane node running in OpenStack, so CAPO never populates OpenStackCluster.Status.Network.
  • The machine reconciliation panics in openstackmachine_controller.go due to a nil pointer dereference on openStackCluster.Status.Network.ID.

Logs:

0116 03:44:55.377796       1 openstackmachine_controller.go:361] "Reconciling Machine" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="kcm-system/openstack-dev-hosted-cp-md-fcpqk-8l4q5" namespace="kcm-system" name="openstack-dev-hosted-cp-md-fcpqk-8l4q5" reconcileID="b00cfcbb-ae39-4bb9-aa87-0bcde7cb350d" openStackMachine="openstack-dev-hosted-cp-md-fcpqk-8l4q5" machine="openstack-dev-hosted-cp-md-fcpqk-8l4q5" cluster="openstack-dev-hosted-cp" openStackCluster="openstack-dev-hosted-cp"
I0116 03:44:55.378942       1 controller.go:110] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="kcm-system/openstack-dev-hosted-cp-md-fcpqk-8l4q5" namespace="kcm-system" name="openstack-dev-hosted-cp-md-fcpqk-8l4q5" reconcileID="b00cfcbb-ae39-4bb9-aa87-0bcde7cb350d"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1baafba]

goroutine 357 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:111 +0x1e5
panic({0x1dccbe0?, 0x362a670?})
        /usr/local/go/src/runtime/panic.go:770 +0x132
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).getOrCreateMachineServer(0xc00043a2a0, {0x2440550, 0xc0005b7230}, 0xc0004deb08, 0xc0006bc508, 0xc0008fa008)
        /workspace/controllers/openstackmachine_controller.go:586 +0x35a
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileMachineServer(0x24467c8?, {0x2440550?, 0xc0005b7230?}, 0xc0006d1560, 0x13?, 0x0?, 0x0?)
        /workspace/controllers/openstackmachine_controller.go:544 +0x3d
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileNormal(0xc00043a2a0, {0x2440550, 0xc0005b7230}, 0xc0006d1560, {0xc000059500, 0x22}, 0xc0004deb08, 0xc0008fa008, 0xc0006bc508)
        /workspace/controllers/openstackmachine_controller.go:363 +0x178
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).Reconcile(0xc00043a2a0, {0x2440550, 0xc0005b7230}, {{{0xc0006b5576?, 0x0?}, {0xc00059d050?, 0xc0008f1d10?}}})
        /workspace/controllers/openstackmachine_controller.go:161 +0xbd8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x24467c8?, {0x2440550?, 0xc0005b7230?}, {{{0xc0006b5576?, 0xb?}, {0xc00059d050?, 0x0?}}})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:114 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004f2160, {0x2440588, 0xc00022f810}, {0x1e96420, 0xc00003d920})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:311 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004f2160, {0x2440588, 0xc00022f810})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:261 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:222 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 203
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:218 +0x486

What did you expect to happen:
That CAPO would handle the absence of status.network gracefully—e.g. by marking the OpenStackMachine with a condition or requeueing—rather than panicking.

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built):
  • Cluster-API version:
  • OpenStack version:
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

Type

No type

Projects

Status

Inbox

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions