-
Notifications
You must be signed in to change notification settings - Fork 272
Fix panic when OpenStack server is deleted by an external agent #2475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panic when OpenStack server is deleted by an external agent #2475
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@EmilienM Can you take a look at this? I'm slightly uncomfortable with the interactions between the machine and server controllers here. Specifically the machine controller is re-litigating the value of I was also wondering if there's an easy way to add an e2e test for this. What do you think? Footnotes
|
/test all |
/test all |
4e4ae5b
to
22c9b36
Compare
/test all |
as I said on Slack, this PR looks good to me, we probably want to sync the conditions at some point. |
wantErr: false, | ||
wantServer: &servers.Server{ | ||
ID: instanceUUID, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asymmetry of not setting InstanceReadyCondition here is weird, but it's weirdly asymmetric in the function under test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know! The condition is set later after getOrCreateServer is called 😢
I'll propose a move of that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhh actually we reconcile until the server has been created and then we check its state later in the switch:
22c9b36
to
281f5b2
Compare
I had a look at copying Conditions from OpenStackServer and it's a can of 🪱 to be honest, I suggest we tackle this later. |
@EmilienM: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-cluster-api-provider-openstack-e2e-full-test |
2049225
to
2bfe7a2
Compare
The panic happened because we did not correctly handle that GetInstanceStatus returns a nil server if the server does not exist, rather than a 404 error. We had the same oversight in the OpenStackServer controller. It looks like this would have resulted in recreating the server.
2bfe7a2
to
4a2a8b7
Compare
@mdbooth I think it worked: https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-openstack/2475/pull-cluster-api-provider-openstack-e2e-test/1899941891034583040/artifacts/clusters/bootstrap/resources/e2e-4g1zo5/OpenStackMachine/cluster-e2e-4g1zo5-control-plane-9g925.yaml status:
addresses:
- address: 10.6.0.211
type: InternalIP
- address: cluster-e2e-4g1zo5-control-plane-9g925
type: InternalDNS
conditions:
- lastTransitionTime: "2025-03-12T22:51:08Z"
message: server has been unexpectedly deleted
reason: InstanceDeleted
severity: Error
status: "False"
type: Ready
- lastTransitionTime: "2025-03-12T22:47:14Z"
status: "True"
type: APIServerIngressReadyCondition
- lastTransitionTime: "2025-03-12T22:51:08Z"
message: server has been unexpectedly deleted
reason: InstanceDeleted
severity: Error
status: "False"
type: InstanceReady
failureMessage: server has been unexpectedly deleted
failureReason: UpdateError
instanceID: 25be8125-214b-473a-8f02-9b3ad44d424b
ready: true |
/hold cancel |
/test pull-cluster-api-provider-openstack-e2e-full-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/cherry-pick release-0.12 |
@EmilienM: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/lgtm I'll let Matt approve it. |
Thanks! /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mdbooth The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@EmilienM: new pull request created: #2477 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cherry-pick release-0.11 |
@mnaser: new pull request created: #2507 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The panic happened because we did not correctly handle that GetInstanceStatus returns a nil server if the server does not exist, rather than a 404 error.
We had the same oversight in the OpenStackServer controller. It looks like this would have resulted in recreating the server.
Fixes: #2474
/hold