Skip to content
This repository has been archived by the owner on Aug 19, 2024. It is now read-only.

Refactor operator logic #70

Closed
wants to merge 59 commits into from
Closed

Refactor operator logic #70

wants to merge 59 commits into from

Conversation

gazarenkov
Copy link
Member

@gazarenkov gazarenkov commented Dec 13, 2023

As stated in #69 with more comments here
the goal of this PR is to improve design, supportability and testablity of Operator.
For this purpose all the configuration complexity , forming Desired State of Custom Resource (see 3 layers of configuration) was moved out of Controller to the Model.

Description

Configuration logic moved out from controller package to the model package, tests added accordingly.

Which issue(s) does this PR fix or relate to

#69

PR acceptance criteria

Integration and (new) unit tests passed.

# Conflicts:
#	.gitignore
#	controllers/backstage_controller.go
#	controllers/backstage_controller_test.go
#	controllers/backstage_deployment.go
#	controllers/backstage_service.go
#	controllers/local_db_deployment.go
#	controllers/local_db_storage.go
# Conflicts:
#	api/v1alpha1/backstage_types.go
#	config/crd/bases/janus-idp.io_backstages.yaml
#	controllers/backstage_backend_auth.go
#	controllers/backstage_controller.go
#	controllers/backstage_controller_test.go
#	controllers/local_db_statefulset.go
@rm3l rm3l mentioned this pull request Dec 20, 2023
@jianrongzhang89
Copy link
Contributor

@gazarenkov Now I got an error when deploying examples/bs1.yaml. Can you take a look?

status:
  conditions:
  - lastTransitionTime: "2024-01-25T16:52:07Z"
    message: 'failed to apply backstage objects failed to update object bs1-route:
      routes.route.openshift.io "bs1-route" is invalid: metadata.resourceVersion:
      Invalid value: 0x0: must be specified for an update'
    reason: DeployFailed
    status: "False"
    type: Deployed

@jianrongzhang89
Copy link
Contributor

@gazarenkov I found that only one service is deployed for the local db. The db statefulset requires a headless service (with serviceName in the statefulset spec) but the headless service is missing. Can you double check?

if err != nil {
return fmt.Errorf("failed to read YAML: %w", err)

if err := r.Update(ctx, obj.Object()); err != nil {
Copy link
Contributor

@jianrongzhang89 jianrongzhang89 Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply calling update() will cause unexpected result because the existing k8s object's meta data information (such as labels, annotations) may get lost. Administrators often may want to add additional labels or annotations in order to manage the cluster resources.
You should consider to use the CreateOrUpdate() library call used currently, or you may have to implement the logic to merge the configuration before calling update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, if we want Backstage owning the objects (which is the case) the only way to reshape runtime objects is to change default/raw/spec configuration, that is the desired state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do want to have runtime objects to be based on our configuration, however we should not prevent admins to add the labels/annotations to them, and this is often done as automation scripts. k8s may inject additional information to the metadata. This will be a serious limitation which will not get accepted by the field.

@gazarenkov
Copy link
Member Author

@gazarenkov I found that only one service is deployed for the local db. The db statefulset requires a headless service (with serviceName in the statefulset spec) but the headless service is missing. Can you double check?

If you consider it important please make sure it is added in test suite, it is not that obvious w/o ability to test it.

@gazarenkov
Copy link
Member Author

gazarenkov commented Jan 26, 2024

@gazarenkov Now I got an error when deploying examples/bs1.yaml. Can you take a look?

status:
  conditions:
  - lastTransitionTime: "2024-01-25T16:52:07Z"
    message: 'failed to apply backstage objects failed to update object bs1-route:
      routes.route.openshift.io "bs1-route" is invalid: metadata.resourceVersion:
      Invalid value: 0x0: must be specified for an update'
    reason: DeployFailed
    status: "False"
    type: Deployed

Do you mean this failed for you?
https://github.com/janus-idp/operator/blob/main/examples/bs1.yaml

@rm3l
Copy link
Member

rm3l commented Jan 26, 2024

@gazarenkov Now I got an error when deploying examples/bs1.yaml. Can you take a look?

status:
  conditions:
  - lastTransitionTime: "2024-01-25T16:52:07Z"
    message: 'failed to apply backstage objects failed to update object bs1-route:
      routes.route.openshift.io "bs1-route" is invalid: metadata.resourceVersion:
      Invalid value: 0x0: must be specified for an update'
    reason: DeployFailed
    status: "False"
    type: Deployed

Do you mean this failed for you? main/examples/bs1.yaml

Got the same issue after applying bs1.yaml. Note that all the resources (including the route) are created and work as expected. Only the CR status contains this error message, which could be misleading I guess..

if err := r.reconcileBackstageService(ctx, &backstage, req.Namespace); err != nil {
return ctrl.Result{}, err
err = r.applyObjects(ctx, bsModel.Objects)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors of conflicting objects are temporary. They will be cleared in next reconciliations and should not cause the CR status to get updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand...We can hope it to be recovered, but status is about current state, which is not desired one I believe (we want it to be deleted but it is not).

Copy link
Contributor

@jianrongzhang89 jianrongzhang89 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of conflicting errors are vey common in operators and all they need to do is to retry the reconciliation. They are typically self-recoverable so they should not cause un-necessary status updates.

Example on this is handled currently in main branch:
https://github.com/janus-idp/operator/blob/main/controllers/backstage_deployment.go#L155

Copy link

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

30 New issues
0 Security Hotspots
No data about Coverage
0.9% Duplication on New Code

See analysis details on SonarCloud

@jianrongzhang89
Copy link
Contributor

@gazarenkov I found that only one service is deployed for the local db. The db statefulset requires a headless service (with serviceName in the statefulset spec) but the headless service is missing. Can you double check?

If you consider it important please make sure it is added in test suite, it is not that obvious w/o ability to test it.

Without such service, it will be confusing for users - the serviceName is set in the statefulset, but the actual service is missing.

This is already in the test cases in main, see https://github.com/janus-idp/operator/blob/main/controllers/backstage_controller_test.go#L396.
In addition, the helm chart deploys both the regular service and the headless service for the local db too.

Please make sure you do not delete existing test cases during refactoring. Otherwise makes us difficult to validate the changes. IMO, refactoring should not even modify existing test cases so that we know none of existing test cases get broken.

@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rm3l rm3l added the do-not-merge/hold PR should not merge yet because someone has issued a /hold command. Required by Prow. label Feb 2, 2024
@gazarenkov gazarenkov closed this Feb 6, 2024
@gazarenkov gazarenkov removed needs-rebase do-not-merge/hold PR should not merge yet because someone has issued a /hold command. Required by Prow. labels Feb 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Layered operator design
5 participants