-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alert about spawning DHCP server into the cluster via helm (dgxie service) #41
Comments
I made a first work to be able to run several dgxie instances on several master nodes In fact, all the dgxie instances can serve the static ip leases (no mater here). But to avoid collision on dynamic ranges, I made something to split ip range bewteen replicas of a statefulset (a kind of consistent hashing). To allow to have distinct volumes to persist leases and to avoid volume claim collision, I switch from ceph claim to local mount point. Last choice also allow to not have a critical component dependant of ceph cluster health. Please tell me what you think about it! Thx |
It also allow to scale dgxie service to x nodes (spreading some load accross more master nodes)... |
We are currently not able to set a fixed key to sign urls at pixiecore level (https://github.com/google/netboot/blob/cc33920b4f3296801a64d731d269978116f40d92/pixiecore/booters.go#L137). |
I closed a previous PR #43 since url nonces can not be checked on different pixiecore instances. I am trying to submit my work on the suject here: |
This is really interesting, nice work! I don't understand how multiple dgxie instances would help HA, if you split the DHCP pool and lose a replica, the nodes getting leases from that instance would still no longer be able to get IPs. So at best it seems like you'd (hopefully) only loose half of the cluster at a time vs the whole thing. Am I missing something? |
Hi guy, it is just the dynamic range that is splited, all instances are able to provide a static lease for cluster nodes because they are all feeding themselves near the pxe-machines configmap (if I understood well how dgxie service is working). This is where the solution provide true ha avoiding to lose any k8s worker node. The counterpart of this is that the dynamic range is not very well managed by the solution (ability to renew a lease is mastered by only one shard and we do not master who will answer to DHCP request the first, so maybe just a new lease will be issued in the fragment owned by hit server) but it should work as far as the dynamic ips is not the rule but the exception. |
Gotcha. Seems reasonable, I'll try it out as soon as I can. |
Marking this as stale. |
Using helm to spawn DHCP server (dgxie service), for unknown reason, we lost the docker daemon on the master, following that the kubelet could no more keep running/start critical services like apiserver... A reboot allowed docker + kubelet recovery.
But some worker nodes lost their ips (not able to renew their leases during the incident), we got an unhealthy ceph cluster. Looking at dgxie pod state after reboot, the pod was stuck on ContainerCreating state due to ceph partial failure (Volume Claim stuck).
Finally all the things recovered replacing Volume claims by empty volumes at dgxie helm service creation, the dgxie service could from there start, nodes recovered their ips, the ceph cluster went healthy and any volume claim could be satisfied.
So two things here:
The text was updated successfully, but these errors were encountered: