-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Bug]: maintenanceTimeWindows is not respected for self-managed certificates #11236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you clarify a little bit how are you using cert-manager here? I guess that when it was implemented it was done on purpose. The @katheris because you are working on supporting cert-manager as well, I would be interested to know your opinion as well. |
Cert-manager has an intermediate CA so it can create certificates for kafka instances. We are still migrating to kubernetes and a lot of services still run outside kubernetes clusters, so injecting the strimzi-generated CA into the clients is quite a bit harder. Therefore, having cert-manager create certificates whose chain goes to a root CA (company-wide) we have already added to all machines is much easier. We have simply created Certificate resources with the corresponding dnsNames, and added the generated secret to the Kafka listener spec. The certificates have an expiry of several months and we have weekly maintenance periods, so it is very unlikely strimzi would be unable to restart the kafka pods before the current cert expires. For now we have added a 1-year duration on the cert-manager Certificate resource that is calculated to happen during a maintenance period, so we should be good until the start of 2027. Strimzi will probably have some support for cert-manager or this issue by that point. |
Hey, I agree with what @ppatierno said that it is intentional that the maintenance windows are only taken into account when Strimzi is managing certificate renewal. The docs state That being said, I do think it's a valid point that the future cert-manager integration should allow users to specify maintenance periods. @bh-tt it would be great if you could take a look at this proposal and see if that makes sense to you for how you use cert-manager. Any feedback you have would be greatly appreciated. The proposal doesn't include mention of maintenance periods, but I think we should take that into account. @ppatierno do you think it is reasonable to include it directly in the implementation, or should I write a short proposal covering it to allow for discussion? |
Proposal looks good, a few small points:
Rest is good with how we use kafka. |
@katheris I would go for an update on the proposal with ... another proposal. It's anyway going to have an impact on the overall process I think. |
Hey @bh-tt on your qns:
Yes, this follows the naming conventions we have elsewhere in Strimzi configuration
In the example of the
Are there particular fields you expect you would need to override? As far as I can tell most of the fields are already accounted for in the Strimzi API or we wouldn't expect users to change. That being said we do provide a template option for other resources, so we could consider it for the Certificate resource if there were enough fields that a user might need to override. I've created the pull request for the proposal around maintenance time windows, please take a look when you get time to make sure this would satisfy your requirements: strimzi/proposals#153 |
Looks good to me @katheris . This would work exactly as I thought it should, and reusing the existing maintenanceTimeWindows is likely the correct option. Fields I'd expect to override sometimes:
At this point about half of the certificate resource is configurable already, so then a template may be the correct thing to do, if only to reduce future maintenance burden (can strimzi support field X on certificates?-like features). |
@bh-tt thanks that list is useful. For info the duration/renewBefore is already in the Strimzi API. There is also an issue open that would address the private key size, since that is something that was requested for the Strimzi certificates. I'll make sure that feature is applied to the cert-manager implementation |
Triaged on 6.3.2025: let's leave this open for more thoughts to be discussed in the next community call. |
@ppatierno when you are considering this I forgot to mention that as Jakub stated above this particular issue is for custom listener certificates, but I think the same thing we discussed in the call stands which is that it would be hard to implement this since the Secrets are being mounted to the Kafka pods directly the next time the pods are rolled for any reason they would pick up the new values. I think it is an opportunity for us to be clear about how the maintenance time windows apply to any certificate updates though and make sure the documentation is also clear on this. |
@katheris yeah I see your point. So I guess we should just try to make it clearer in the doc. |
Triaged on 15.5.2025: We should update the documentation to make it clearer that this only applies to certificates that Strimzi generates. There is a separate discussion happening about cert-manager and whether it should apply once that is supported by Strimzi, but we will keep that discussion separate. |
Bug Description
We are using cert-manager to create certificates for kafka brokers, however on each rotation all brokers are restarted. I thought that the maintenanceTimeWindows setting would prevent this. When looking at the code itself, it seems this setting is only used when strimzi-generated certificates need to be rotated.
See
strimzi-kafka-operator/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java
Line 915 in e36f4a8
A fairly easy fix would be to include the isMaintenanceTimeWindow as an argument to ReconcilerUtils#trackedServerCertChanged, which is only called in that single reconciler function. I can make a PR if that is desired.
Steps to reproduce
Expected behavior
I'd expect strimzi to respect all 'out of the blue' changes to kafka pods (e.g. without changes to Kafka or KafkaNodePool resources) to only apply during the next maintenanceTimeWindows.
Strimzi version
0.45.0
Kubernetes version
1.31.6
Installation method
helm chart
Infrastructure
bare-metal
Configuration files and logs
No response
Additional context
I understand this may not have initially been necessary, as self-managed certificates usually means 'the cluster operator (human) renewing the certificate knows what they are doing), but with cert-manager most of this goes away. The price is that cert-manager will rotate the certificate when necessary, which for most apps is not a problem given hot-reloading capabilities.
The text was updated successfully, but these errors were encountered: