Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert about spawning DHCP server into the cluster via helm (dgxie service) #41

Closed
hightoxicity opened this issue Dec 6, 2018 · 8 comments

Comments

@hightoxicity
Copy link
Contributor

hightoxicity commented Dec 6, 2018

Using helm to spawn DHCP server (dgxie service), for unknown reason, we lost the docker daemon on the master, following that the kubelet could no more keep running/start critical services like apiserver... A reboot allowed docker + kubelet recovery.
But some worker nodes lost their ips (not able to renew their leases during the incident), we got an unhealthy ceph cluster. Looking at dgxie pod state after reboot, the pod was stuck on ContainerCreating state due to ceph partial failure (Volume Claim stuck).
Finally all the things recovered replacing Volume claims by empty volumes at dgxie helm service creation, the dgxie service could from there start, nodes recovered their ips, the ceph cluster went healthy and any volume claim could be satisfied.

So two things here:

  • Spawning dgxie service into k8s cluster should propose a ha mechanism (dhcp is a critical service)
  • We should think what are the dgxie true storage requirement and ensure dgxie storage resiliency
@hightoxicity
Copy link
Contributor Author

I made a first work to be able to run several dgxie instances on several master nodes

hightoxicity@9bdf03b

In fact, all the dgxie instances can serve the static ip leases (no mater here). But to avoid collision on dynamic ranges, I made something to split ip range bewteen replicas of a statefulset (a kind of consistent hashing).

To allow to have distinct volumes to persist leases and to avoid volume claim collision, I switch from ceph claim to local mount point.

Last choice also allow to not have a critical component dependant of ceph cluster health.

Please tell me what you think about it!

Thx

@hightoxicity
Copy link
Contributor Author

It also allow to scale dgxie service to x nodes (spreading some load accross more master nodes)...

@hightoxicity
Copy link
Contributor Author

We are currently not able to set a fixed key to sign urls at pixiecore level (https://github.com/google/netboot/blob/cc33920b4f3296801a64d731d269978116f40d92/pixiecore/booters.go#L137).

@hightoxicity
Copy link
Contributor Author

I closed a previous PR #43 since url nonces can not be checked on different pixiecore instances.
To be able to have dgxie ha, we need to have a way to provide the pixiecore signing key from "outside" (env var fed by kube secret for example).

I am trying to submit my work on the suject here:

danderson/netboot#84

@hightoxicity hightoxicity mentioned this issue Dec 12, 2018
@dholt
Copy link
Contributor

dholt commented Dec 12, 2018

This is really interesting, nice work!

I don't understand how multiple dgxie instances would help HA, if you split the DHCP pool and lose a replica, the nodes getting leases from that instance would still no longer be able to get IPs. So at best it seems like you'd (hopefully) only loose half of the cluster at a time vs the whole thing. Am I missing something?

@hightoxicity
Copy link
Contributor Author

Hi guy, it is just the dynamic range that is splited, all instances are able to provide a static lease for cluster nodes because they are all feeding themselves near the pxe-machines configmap (if I understood well how dgxie service is working). This is where the solution provide true ha avoiding to lose any k8s worker node. The counterpart of this is that the dynamic range is not very well managed by the solution (ability to renew a lease is mastered by only one shard and we do not master who will answer to DHCP request the first, so maybe just a new lease will be issued in the fragment owned by hit server) but it should work as far as the dynamic ips is not the rule but the exception.

@dholt
Copy link
Contributor

dholt commented Dec 13, 2018

Gotcha. Seems reasonable, I'll try it out as soon as I can.

@supertetelman
Copy link
Collaborator

Marking this as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants