-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Etcd Certificates are not generated when adding nodes to an existing cluster with scale.yml #12120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Welcome @kartsank! |
Hi @kartsank. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/retest |
You're duplicating the group definition in the task above (with group_by).
Even if add_host is the right approach, we should only keep one group generation.
|
An existing group was created using inventory_hostname (etcd, control-plane). If the etcd play is executed without the kube_node or new_host inventory group, the node certificate will not be generated on the first etcd node. To address this, I am re-evaluating all hosts in the k8s_cluster instead of just the inventory_hostname hosts and adding any missing hosts to the existing group. |
I see where the problem is.
This was fixed for cluster.yml and upgrade-cluster.yml in #10769 , but scale.yml does not use the common playbook install-etcd.yml which has the correct behavior (dynamically add only the needed hosts, aka nodes using a direct etcd client).
I don't think there is anything preventing us from using the same thing in scale.yml, which should fix the problem.
|
Initially, I considered including install-etcd.yml, but it adds all kube_node to the groups rather than just the missing nodes. My goal is to create the node certificate only for the missing or new hosts, rather than re-creating the node certificate each time. https://github.com/kubernetes-sigs/kubespray/blob/master/playbooks/install_etcd.yml#L1C1-L15C17 If force_etcd_cert_refresh is set to true, we can add all hosts. Otherwise, there is no need to re-create the node certificate for every execution of scale.yml. Additionally, the installation of etcd is not required for scale.yml unless etcd_cluster_setup is enabled for new nodes. https://github.com/kubernetes-sigs/kubespray/blob/master/playbooks/install_etcd.yml#L17-L29 |
group_by still consider --limit, or does it not ? So even with force_etcd_refresh + scale.yml with --limit (which is the intended usage), I don't see the advantage. |
I believe the limit is not considered by group_by. Therefore, we have two options: we can either include install-etcd.yml within scale.yml, or we can add the kube_node group to the existing ETCD play in scale.yml.
https://github.com/kubernetes-sigs/kubespray/blob/master/playbooks/scale.yml#L9 |
I don't think so limit is not consider by group_by.
It does, I just tested.
|
I tested the install_etcd.yml in scale.yml, and all tests were successful. Therefore, I will include install_etcd.yml in scale.yml to resolve the add_node certificate issue. Thank you for the feedback. I will update this pull request to include install_etcd.yml. @VannTen Please review and provider your comments |
f077093
to
872787e
Compare
@VannTen Please review and provider your comments. updated scale.yml with install_etcd playbook. |
9ea55a8
to
f1e8bda
Compare
f1e8bda
to
c33c394
Compare
vars: | ||
etcd_cluster_setup: false | ||
etcd_events_cluster_setup: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only slight concern I have is this, which differs in install-etcd.yml.
However this only affect 'etcd', so it would only be a problem for using scale.yml with etcd nodes, which is not supported. I wonder if we have an easy way to fail early if something like this is attempted 🤔
Maybe that's out of scope for this though.
@tico88612 @ant31 thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we set the variables in the import? or with set fact before ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for etcd_cluster_setup is set to true. However, this can be changed to false by configuring the variables in scale.yaml.
https://github.com/kubernetes-sigs/kubespray/blob/master/roles/etcd/defaults/main.yml#L6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I don't why I thought we could not set vars on import_playbook
.
In that case we'd define in both cases at import_playbook level and be done with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to update cluster.yml and upgrade-cluster.yml as well.
vars: | ||
etcd_cluster_setup: false | ||
etcd_events_cluster_setup: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I don't why I thought we could not set vars on import_playbook
.
In that case we'd define in both cases at import_playbook level and be done with it.
playbooks/install_etcd.yml
Outdated
vars: | ||
etcd_cluster_setup: true | ||
etcd_events_cluster_setup: "{{ etcd_events_cluster_enabled }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the whole section should be scraped and handled at import_playbook
level on both use sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VannTen updated on cluster.yml and upgrade_cluster.yml. please review. Thanks
updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience !
That should do it
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kartsank, VannTen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for your review support and approval. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
if will add new host or missing node etcd cert host to gen_node_certs_True group as part of check_certs and use to create certs using gen_certs task.
Which issue(s) this PR fixes:
Fixes #12117
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
None