Skip to content

clusteradm join fails to add GKE cluster as managed cluster to kind hub and errors in OCM pod logs #958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mattwelke opened this issue Apr 17, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@mattwelke
Copy link

Describe the bug
I was following the pages in the docs on setting up a hub (https://open-cluster-management.io/docs/getting-started/installation/start-the-control-plane/) and registering a cluster (https://open-cluster-management.io/docs/getting-started/installation/register-a-cluster/). My plan was to achieve a hub running with a GKE managed cluster joined to the hub. I didn't care where the hub was running (only that the managed cluster was GKE) so I used kind for it.

The clusteradm join command ended up making no progress with the message "Waiting for klusterlet agent to become ready... (UnavailablePods)" displayed. And I saw errors in the two OCM pod logs.

See reproduction details below for details.

To Reproduce
Steps to reproduce the behavior:

  1. Follow https://open-cluster-management.io/docs/getting-started/installation/start-the-control-plane/ to create a hub cluster using kind, until you get the message displayed with the clusteradm join command with your token etc in it.
  2. Create a GKE cluster (standard, not Autopilot).
  3. Follow https://open-cluster-management.io/docs/getting-started/installation/register-a-cluster/ to join the GKE cluster, using the clusteradm command from the previous step but with --force-internal-endpoint-lookup added to it (as the page says you should do since the hub is a kind cluster). Example command is as follows.
    clusteradm join --hub-token … \
      --hub-apiserver https://127.0.0.1:40913 \
      --wait \
      --cluster-name … \
      --force-internal-endpoint-lookup \
      --context …
    
  4. Observe as the command proceeds to install things in the managed cluster but then makes no progress when it displays "Waiting for klusterlet agent to become ready... (UnavailablePods)".
  5. Observe errors like the following in the logs of the pod klusterlet-registration-agent-….
    W0417 19:57:49.177215       1 reflector.go:561] k8s.io/client-go@v0.31.4/tools/cache/reflector.go:243: failed to list *v1.CertificateSigningRequest: Get "https://kind-hub-control-plane:6443/apis/certificates.k8s.io/v1/certificatesigningrequests?limit=500&resourceVersion=0": dial tcp: lookup kind-hub-control-plane on 34.118.224.10:53: no such host                                                                                                    
    E0417 19:57:49.177301       1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.4/tools/cache/reflector.go:243: Failed to watch *v1.CertificateSigningRequest: failed to list *v1.CertificateSigningRequest: Get \"https://kind-hub-control-plane:6443/apis/certificates.k8s.io/v1/certificatesigningrequests?limit=500&resourceVersion=0\": dial tcp: lookup kind-hub-control-plane on 34.118.224.10:53: no such host"
    
  6. Observe errors like the following in the logs of the pod klusterlet-….
    I0417 19:55:50.736639       1 manager.go:245] crd appliedmanifestworks.work.open-cluster-management.io is updated to version 0.16.0
    I0417 19:55:50.814229       1 manager.go:245] crd clusterclaims.cluster.open-cluster-management.io is updated to version 0.16.0
    E0417 19:55:50.911006       1 base_controller.go:277] "Unhandled Error" err="\"KlusterletStatusController\" controller failed to sync \"klusterlet\", err: Operation cannot be fulfilled on klusterlets.operator.open-cluster-management.io \"klusterlet\": the object has been modified; please apply your changes to the latest version and try again"
    

Expected behavior
The clusteradm join command to complete, the GKE cluster to be successfully joined to the hub cluster, and at least one managedcluster resource to exist in the hub cluster after these steps finish.

Environment ie: OCM version, Kubernetes version and provider:

$ clusteradm version
client		version	:v0.11.0-0-g73281f6
server release	version	:v1.31.6-gke.1020000
default bundle	version	:0.16.0

Additional context

$ kubectl version
Client Version: v1.32.1
Kustomize Version: v5.5.0
Server Version: v1.31.6-gke.1020000
@mattwelke mattwelke added the bug Something isn't working label Apr 17, 2025
Copy link

dosubot bot commented Apr 17, 2025

To resolve the issue where the clusteradm join command fails to add a GKE cluster as a managed cluster to a kind hub, resulting in "Waiting for klusterlet agent to become ready... (UnavailablePods)" and errors in the OCM pod logs, ensure you use the --force-internal-endpoint-lookup flag when joining the cluster. The command should be:

clusteradm join --force-internal-endpoint-lookup --wait --hub-token <hub-token> --hub-apiserver <hub-apiserver> --cluster-name <cluster-name> --context <managed-cluster-context>

Make sure to set the --context flag to specify the context of your managed cluster. This flag is necessary when using KinD clusters to set up OCM. Additionally, ensure that your network settings allow outbound connections from the managed cluster to the hub cluster's API server. If using a proxy, verify that it is correctly configured to allow these connections.

After the OCM agent is running on your managed cluster, it will send a “handshake” to your hub cluster and wait for approval from the hub cluster admin. Follow these steps to accept the join request and verify the setup:

  1. Wait for the creation of the CSR object by your managed cluster's OCM agents on the hub cluster:

    kubectl get csr -w --context ${CTX_HUB_CLUSTER} | grep <cluster-name>
  2. Accept the join request using the clusteradm tool:

    clusteradm accept --clusters <cluster-name> --context ${CTX_HUB_CLUSTER}
  3. Verify the installation of the OCM agents on your managed cluster:

    kubectl -n open-cluster-management-agent get pod --context ${CTX_MANAGED_CLUSTER}
  4. Verify that the <cluster-name> ManagedCluster object was created successfully:

    kubectl get managedcluster --context ${CTX_HUB_CLUSTER}

Ensure that the ManagedCluster object shows HUB ACCEPTED as true and JOINED and AVAILABLE as True [1][2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@mikeshng
Copy link
Member

--hub-apiserver https://127.0.0.1:40913 \

The GKE cluster needs to reach out and connect to the hub API server. So this API address needs to be valid.

The documentation probably needs to be fixed. It probably assume if you use hub as KinD cluster then your managedcluster is KinD as well.

@mattwelke
Copy link
Author

mattwelke commented Apr 17, 2025

Thanks! That was an oversight on my part. I was aware from colleagues using OCM that that flag instructs the managed cluster how to contact the hub cluster, but I didn't realize that the hub and spoke wouldn't be able to communicate with each other in my setup since my hub cluster was running on my laptop.

I created #959 after running through it again with two GKE clusters this time (I assumed that they would be able to reach each others' control planes since I can use public IP to reach either from my computer) and getting a different error message.

I believe the solution to closing out this issue would be, as you proposed, some added documentation in the getting started section clarifying the network architecture and what would happen if someone were to use kind for one of the clusters or all of the clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants