Skip to content

[Bug]: maintenanceTimeWindows is not respected for self-managed certificates #11236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bh-tt opened this issue Mar 7, 2025 · 13 comments · May be fixed by #11457
Open

[Bug]: maintenanceTimeWindows is not respected for self-managed certificates #11236

bh-tt opened this issue Mar 7, 2025 · 13 comments · May be fixed by #11457
Assignees

Comments

@bh-tt
Copy link

bh-tt commented Mar 7, 2025

Bug Description

We are using cert-manager to create certificates for kafka brokers, however on each rotation all brokers are restarted. I thought that the maintenanceTimeWindows setting would prevent this. When looking at the code itself, it seems this setting is only used when strimzi-generated certificates need to be rotated.

See

protected Future<Void> rollingUpdate(Map<String, ReconcileResult<StrimziPodSet>> podSetDiffs) {

A fairly easy fix would be to include the isMaintenanceTimeWindow as an argument to ReconcilerUtils#trackedServerCertChanged, which is only called in that single reconciler function. I can make a PR if that is desired.

Steps to reproduce

  1. create a certificate for brokers
  2. create a kafka instance with maintenanceTimeWindows set to a different day than now
  3. wait for ready
  4. update the certificate (say, rotate it so the notAfter changes)
  5. watch as strimzi proceeds to roll all the pods

Expected behavior

I'd expect strimzi to respect all 'out of the blue' changes to kafka pods (e.g. without changes to Kafka or KafkaNodePool resources) to only apply during the next maintenanceTimeWindows.

Strimzi version

0.45.0

Kubernetes version

1.31.6

Installation method

helm chart

Infrastructure

bare-metal

Configuration files and logs

No response

Additional context

I understand this may not have initially been necessary, as self-managed certificates usually means 'the cluster operator (human) renewing the certificate knows what they are doing), but with cert-manager most of this goes away. The price is that cert-manager will rotate the certificate when necessary, which for most apps is not a problem given hot-reloading capabilities.

@ppatierno
Copy link
Member

Can you clarify a little bit how are you using cert-manager here?

I guess that when it was implemented it was done on purpose.
If the user is managing the certificates, they should know when rotation is going to happen or they are anyway in control of it. Anyway I can see your point because cert-manager is going to rotate for you when it's needed but maybe you don't want to do the upgrade yet.
Actually the code which is in charge of renewal by taking into account the maintainance window is here:

https://github.com/strimzi/strimzi-kafka-operator/blob/main/operator-common/src/main/java/io/strimzi/operator/common/model/Ca.java#L542

The maintenanceWindowSatisfied is totally ignored when CA certificate is handled by the user so that the renewal type is never RenewalType.POSTPONED.

@katheris because you are working on supporting cert-manager as well, I would be interested to know your opinion as well.

@bh-tt
Copy link
Author

bh-tt commented Mar 10, 2025

Cert-manager has an intermediate CA so it can create certificates for kafka instances. We are still migrating to kubernetes and a lot of services still run outside kubernetes clusters, so injecting the strimzi-generated CA into the clients is quite a bit harder. Therefore, having cert-manager create certificates whose chain goes to a root CA (company-wide) we have already added to all machines is much easier.

We have simply created Certificate resources with the corresponding dnsNames, and added the generated secret to the Kafka listener spec. The certificates have an expiry of several months and we have weekly maintenance periods, so it is very unlikely strimzi would be unable to restart the kafka pods before the current cert expires.

For now we have added a 1-year duration on the cert-manager Certificate resource that is calculated to happen during a maintenance period, so we should be good until the start of 2027. Strimzi will probably have some support for cert-manager or this issue by that point.

@katheris
Copy link
Contributor

Hey, I agree with what @ppatierno said that it is intentional that the maintenance windows are only taken into account when Strimzi is managing certificate renewal. The docs state Schedule certificate renewal updates by the Cluster Operator to Kafka or ZooKeeper clusters for minimal impact on client applications.

That being said, I do think it's a valid point that the future cert-manager integration should allow users to specify maintenance periods.

@bh-tt it would be great if you could take a look at this proposal and see if that makes sense to you for how you use cert-manager. Any feedback you have would be greatly appreciated.

The proposal doesn't include mention of maintenance periods, but I think we should take that into account. @ppatierno do you think it is reasonable to include it directly in the implementation, or should I write a short proposal covering it to allow for discussion?

@bh-tt
Copy link
Author

bh-tt commented Mar 10, 2025

Proposal looks good, a few small points:

  • cacert.secretName/cacert.certificate settings are not quite clear. Is cacert.certificate the key in the secret containing the certificate? Then I'd suggest using secretName and key like a standard SecretKeySelector. Additionally, it could be common for users to put a CA in a configmap (since it is public information), so it would be nice if strimzi supports that. Trust-manager is mentioned, which also only outputs to ConfigMaps?
  • I cannot find a mention whether strimzi will request certificates with isCA: true. That would likely be blocked in most organizations, including ours, since intermediate cA often have the same signing capabilities as their parent. Will strimzi request this for the cluster CA or will it use the given CA?
  • might be nice to simply accept a Certificate resource spec as an argument, that way users can kustomize all settings (where strimzi adds a few settings like dnsNames)?

Rest is good with how we use kafka.

@ppatierno
Copy link
Member

The proposal doesn't include mention of maintenance periods, but I think we should take that into account. @ppatierno do you think it is reasonable to include it directly in the implementation, or should I write a short proposal covering it to allow for discussion?

@katheris I would go for an update on the proposal with ... another proposal. It's anyway going to have an impact on the overall process I think.

@katheris
Copy link
Contributor

Hey @bh-tt on your qns:

Is cacert.certificate the key in the secret containing the certificate?

Yes, this follows the naming conventions we have elsewhere in Strimzi configuration

I cannot find a mention whether strimzi will request certificates with isCA: true.

In the example of the Certificate resource I set isCA to be false, and that is my intention for the implementation.

might be nice to simply accept a Certificate resource spec as an argument

Are there particular fields you expect you would need to override? As far as I can tell most of the fields are already accounted for in the Strimzi API or we wouldn't expect users to change. That being said we do provide a template option for other resources, so we could consider it for the Certificate resource if there were enough fields that a user might need to override.

I've created the pull request for the proposal around maintenance time windows, please take a look when you get time to make sure this would satisfy your requirements: strimzi/proposals#153

@bh-tt
Copy link
Author

bh-tt commented Mar 10, 2025

Looks good to me @katheris . This would work exactly as I thought it should, and reusing the existing maintenanceTimeWindows is likely the correct option.

Fields I'd expect to override sometimes:

  • duration/renewBefore: plenty of users may have very strict maintenance windows, so the default of cert-manager (90 days?) may be too short for some
  • private key size: some people may have requirements on the minimum key size
  • setting subject organization/country etc. may be required as well
  • I probably won't set URIs or emailAddresses, but someone with public kafka's may decide to?

At this point about half of the certificate resource is configurable already, so then a template may be the correct thing to do, if only to reduce future maintenance burden (can strimzi support field X on certificates?-like features).

@katheris
Copy link
Contributor

@bh-tt thanks that list is useful. For info the duration/renewBefore is already in the Strimzi API. There is also an issue open that would address the private key size, since that is something that was requested for the Strimzi certificates. I'll make sure that feature is applied to the cert-manager implementation

@scholzj
Copy link
Member

scholzj commented Mar 10, 2025

@katheris @bh-tt This is completely unrelated to the issue and should not be discussed here. This issue does not seem to be even about a custom CA but a custom listener certificate.

@ppatierno
Copy link
Member

Triaged on 6.3.2025: let's leave this open for more thoughts to be discussed in the next community call.

@katheris
Copy link
Contributor

@ppatierno when you are considering this I forgot to mention that as Jakub stated above this particular issue is for custom listener certificates, but I think the same thing we discussed in the call stands which is that it would be hard to implement this since the Secrets are being mounted to the Kafka pods directly the next time the pods are rolled for any reason they would pick up the new values. I think it is an opportunity for us to be clear about how the maintenance time windows apply to any certificate updates though and make sure the documentation is also clear on this.

@ppatierno
Copy link
Member

@katheris yeah I see your point. So I guess we should just try to make it clearer in the doc.

@katheris
Copy link
Contributor

Triaged on 15.5.2025: We should update the documentation to make it clearer that this only applies to certificates that Strimzi generates. There is a separate discussion happening about cert-manager and whether it should apply once that is supported by Strimzi, but we will keep that discussion separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants