Skip to content

feat: generate a self-signed cert for registry addon #1127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

dkoshkin
Copy link
Contributor

@dkoshkin dkoshkin commented May 17, 2025

What problem does this PR solve?:
This PR adds the functionality to set a self-signed certificate for the registry addon.
A new Certificate deployed by the Helm chart generates a 10 year CA Secret registry-addon-tls in the namespace where this controller is running.

The ca.crt data is copied into a Cluster specific Secret (with an OwnerRef set) in a BeforeClusterCreate hook and is then used by the mirror handler.

This CA is then used to sign a new 2 year TLS certificate for every cluster that is copied to the remote cluster registry-system/registry-tls and used by the registry Pods. A new cert is generated and copied on every invocation of the handler, on BeforeClusterUpgrade and AfterControlPlaneInitialized lifecycle hooks, so that we can punt on auto renewal for now.

Which issue(s) this PR fixes:
Fixes #

How Has This Been Tested?:

  1. Tested manually by create a Docker cluster and verifying I can push an image into the registry, observe it get synced across the replicas, and then get pulled with Containerd.
$ kubectl get secret -n registry-system
NAME                                                TYPE                 DATA   AGE
cncf-distribution-registry-docker-registry-secret   Opaque               3      5h16m
registry-tls                                        kubernetes.io/tls    3      5h16m
sh.helm.release.v1.cncf-distribution-registry.v1    helm.sh/release.v1   1      5h16m


$ kubectl port-forward --address=127.0.0.1 --namespace registry-system pod/cncf-distribution-registry-docker-registry-0 5000:5000
$ crane copy nginx:stable 127.0.0.1:5000/library/nginx:dkoshkin --insecure
$ kubectl run nginx-working --image=docker.io/library/nginx:dkoshkin
$ kubectl get pods
NAME                                                              READY   STATUS              RESTARTS   AGE
cluster-autoscaler-0196db54-35b0-73fd-ad5b-14f998751820-7bzwnm9   0/1     ContainerCreating   0          5h15m
nginx-working                                                     1/1     Running             0          53m
  1. New integration tests.
  2. Existing e2e tests already wait for the STS Pods to be Ready which won't happen unless the TLS Secret is there and is valid.

Special notes for your reviewer:

I've considered other approaches here:

  1. Using cert-manager to generate a unique CA per cluster. This makes it more difficult for clients to trust the registries when pushing images to multiple clusters.
  2. Use cert-manager to also generate the TLS certificate, it got pretty complicated to trigger a renewal and needing to wait for it to reconcile the cert Secret. Using cert-manager was also not providing much value as we want better control when the certs get rotated - 118b39e.
  3. Use the CAPI generated CA cert to sign a new cert, but decided against using an externally managed CA for a separate usecase.

This includes TLS fixes in the chart.
@dkoshkin dkoshkin force-pushed the dkoshkin/feat-generate-registry-ca branch 3 times, most recently from 303cc53 to 8bb76fd Compare May 17, 2025 04:15
@github-actions github-actions bot added feature and removed feature labels May 17, 2025
@dkoshkin dkoshkin force-pushed the dkoshkin/feat-generate-registry-ca branch from 8bb76fd to 4fabe1c Compare May 17, 2025 04:19
@github-actions github-actions bot added feature and removed feature labels May 17, 2025
@dkoshkin dkoshkin requested a review from Copilot May 18, 2025 03:47
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds end-to-end support for generating and distributing a self-signed CA and TLS certificates for the registry addon, updating both code and Helm charts.

  • Introduces an envtest helper to simulate a remote cluster kubeconfig secret.
  • Refactors secret utilities for local/remote copy and certificate generation.
  • Updates handlers, tests, and charts to wire in CA/TLS lifecycle hooks and bump chart versions.

Reviewed Changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/helpers/envtest.go Add WithFakeRemoteClusterClient helper for envtest
pkg/handlers/utils/utils.go Extract GetDeploymentNamespace and simplify TLSConfig logic
pkg/handlers/utils/secrets.go Refactor secret-copy functions into EnsureSecretOnRemoteCluster and EnsureSecretForLocalCluster
pkg/handlers/utils/secrets_integration_test.go Add integration tests for remote/local secret helpers
pkg/handlers/utils/owner_reference.go Add unused owner-reference utility
pkg/handlers/utils/owner_reference_test.go Update copyright year
pkg/handlers/generic/mutation/mirrors/inject.go Refactor containerdConfigFromRegistryAddon for HTTPS and CA
pkg/handlers/generic/mutation/mirrors/inject_test.go Update mirror/tests to use new registry secret helpers
pkg/handlers/generic/lifecycle/registry/utils/utils_suite_test.go Add Ginkgo suite entry for registry/utils
pkg/handlers/generic/lifecycle/registry/utils/tls.go Implement CA/TLS generation and secret distribution logic
pkg/handlers/generic/lifecycle/registry/utils/tls_test.go Unit tests for certificate generation
pkg/handlers/generic/lifecycle/registry/utils/tls_integration_test.go Envtest coverage for CA and TLS secrets on clusters
pkg/handlers/generic/lifecycle/registry/handler.go Wire in BeforeClusterCreate, BeforeClusterUpgrade, and setup hook
pkg/handlers/generic/lifecycle/registry/cncfdistribution/handler.go Use EnsureCASecretForCluster and EnsureTLSCertificateSecretOnRemoteCluster
pkg/handlers/generic/lifecycle/registry/cncfdistribution/handler_test.go Add unit test for DNS names helper
hack/addons/helm-chart-bundler/repos.yaml Bump docker-registry chart version
charts/cluster-api-runtime-extensions-nutanix/templates/helm-config.yaml Sync chart version for cncf-distribution-registry
charts/cluster-api-runtime-extensions-nutanix/templates/certificates.yaml Introduce registry-addon-tls CA Certificate resource
charts/cluster-api-runtime-extensions-nutanix/addons/registry/cncf-distribution/values-template.yaml Change service port to 443, add tlsSecretName
Files not reviewed (1)
  • hack/addons/kustomize/cncf-distribution-registry/kustomization.yaml.tmpl: Language not supported
Comments suppressed due to low confidence (4)

pkg/handlers/generic/lifecycle/registry/handler.go:84

  • AfterControlPlaneInitialized invokes apply without first calling setup, which may skip required pre-setup actions. Add r.setup(ctx, &req.Cluster, commonResponse) before calling r.apply.
r.apply(ctx, &req.Cluster, commonResponse)

test/helpers/envtest.go:168

  • The code uses fmt.Sprintf, corev1.Secret, and context.Background but the imports for "fmt", "context", and corev1 ("k8s.io/api/core/v1") are missing. Add these imports to avoid compilation errors.
Name:      fmt.Sprintf("%s-kubeconfig", cluster.Name),

pkg/handlers/generic/lifecycle/registry/utils/tls.go:199

  • There is no standard cmp.Or function for durations. Replace with a simple conditional: e.g., d := opts.Spec.Duration; if d == 0 { d = defaultCertificateDuration }; NotAfter: time.Now().Add(d).
NotAfter:              time.Now().Add(cmp.Or(opts.Spec.Duration, defaultCertificateDuration)),

pkg/handlers/utils/owner_reference.go:1

  • [nitpick] The owner_reference.go file defines an owner-reference helper that is never referenced or tested elsewhere. Consider removing it or wiring it into called code and adding tests.
// Copyright 2025 Nutanix. All rights reserved.

@dkoshkin dkoshkin force-pushed the dkoshkin/feat-generate-registry-ca branch from 4fabe1c to 24ef0c9 Compare May 19, 2025 15:48
Comment on lines +88 to +89
// Intentionally not using cert-manager to create the certificate,
// as we want to avoid automatic renewal and instead recreate the certificate each time with a new expiration date.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens in case cluster is not upgraded before expiry date? do we need to manually renew certs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct, it would need to be renewed manually.
This was the reason for a 2 year cert and to always renew the cert in the BeforeClusterUpgrade handler, to minimize the potential of that happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, could you comment the reason to avoid automatic renewal? we can maybe set renewal period to 1 or 2 yrs to avoid frequent renewals.

@dkoshkin dkoshkin force-pushed the dkoshkin/feat-generate-registry-ca branch from f04c904 to dffe307 Compare May 27, 2025 20:48
@dkoshkin dkoshkin force-pushed the dkoshkin/feat-generate-registry-ca branch from dffe307 to 9e22cd5 Compare May 27, 2025 21:13
Copy link
Contributor

@manoj-nutanix manoj-nutanix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LG

stsName = "cncf-distribution-registry-docker-registry"
stsHeadlessServiceName = "cncf-distribution-registry-docker-registry-headless"
stsReplicas = 2
tlsSecretName = "registry-tls"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the clients of this registry, e.g. konvoy CLI, would fetch client certs before pushing images to this registry?
Will they directly fetch CA from this secret or registry-addon-root-ca ?

IPAddresses: certificateIPAddresses(serviceIP),
},
}
err = utils.EnsureTLSCertificateSecretOnRemoteCluster(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for separate PR: It would be good to provide a dev doc with diagram to demonstrate creation of global root CA, registry CA for a cluster and TLS certificates for a registry and how they will be copied, updated etc.
Also good to document how client can push to this registry.

clusterCASecret := buildClusterCASecret(globalTLSCertificateSecret, cluster)

// Copy the global CA certificate to a cluster CA secret.
err = handlersutils.EnsureSecretForLocalCluster(ctx, c, clusterCASecret, cluster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm the cluster CA will stay on the management cluster and regenerated every time on upgrade?

//
// Intentionally not using cert-manager to create the certificate,
// as we want to avoid automatic renewal and instead recreate the certificate each time with a new expiration date.
func EnsureTLSCertificateSecretOnRemoteCluster(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
func EnsureTLSCertificateSecretOnRemoteCluster(
func EnsureRegistryServerCertificateSecretOnRemoteCluster(

Can we explicitly use registry TLS or Server certificates to ensure that these certs are going to be used by the registry pods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants