-
Notifications
You must be signed in to change notification settings - Fork 14
ci: Run E2E tests against all versions of K8s we claim support for #397
ci: Run E2E tests against all versions of K8s we claim support for #397
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues to me:
- It is not clear what issue we are solving here and so the "status" of this testing is not clear as well. It refers to OCP versions which clearly based on certain K8s version, but the tests are running against K8s (sure) so it can not be taken as OCP compatibility tests anyway.
- It definitely brings some useful compatibility information for K8s X.Y version, while Z version can be basically "any" (no breaking changes expected). Taking into account that K8s releases every 3-4 months, to me it sounds like overkill to make it nightly.
- In my opinion having them nightly just overloads the actions output (and calculation resources :) ) with mostly the same information.
So, I would say it as a useful test but we probably need to have more clear definition of it. For example: as an upstream pre-release tests which makes sure we support supportable K8s versions and happened as a part of pre-release procedure and not nightly.
I would ideally run those tests directly on OCP if it was as easy as a GitHub Action, but I couldn't find a simple way to spin up a minimal OCP cluster in a GitHub runner. So this uses the OCP versions we claim support for as baselines to determine the versions of K8s we could/should test against.
The variable here is the operator code (which can change frequently), which I think we should have real E2E tests for, but against the versions of K8s that we should support (this list won't change frequently either).
The actions output are separated per job, so per configuration in the matrix. So it won't change that much from the existing nightly workflow. It is just we have new jobs from the matrix to look at.
The goal of the nightly Workflow is to prevent a last-minute rush before a release when you notice something does not work end-to-end, by running a comprehensive suite of tests against the released image regularly. This way, based on your last nightly job status, you can even release any good commit at any time. Are you concerned about having to check the status too frequently? |
You can setup tests on OpenShift CI to run against OpenShift clusters, there are lots of examples of how to do this in the github.com/openshift org, specifically look at the https://github.com/openshift/release repository |
Yeah, I know, and that would be a potential next step (if we agree with the approach here).. For now, this tries to leverage the existing GH Workflow we have for testing against vanilla K8s. |
Like I said, the compatibility tests (k8s or/and OCP if possible) are useful. At the same time in absolute majority of the cases they do not bring any useful information daily, so very quickly will be percieved as a "white noice", as a related example you can see current state of nightly tests, they failed in many cases by known reason but we do not care about it b/c they are not that important b/c pretty much everything is covered in other (PR) flows. On the other hand, from outside, it looks like as broken environment. That's why I really doubt in usefulness of daily tests in our context(!) at all. So, I see such tests rather as a part of the (conscious) [pre/after] release process/recommendations than periodical job. I do not really understand your "last-minute rush" concern, it's time to focus on development and time to focus on release ("code freeze" state in our case). Could you please be more specific about the cases you have in mind? |
PR tests currently do not run against a real cluster to make sure that everything work end-to-end. So to me, I am not confident enough that a merged PR with the current tests passing guarantees that the deployed application is actually reachable (which is what matters to the end-user, right?).
Is there such a process in place? A recommendation is only useful if there is a way to enforce it, and a nightly job is an almost cheap and automated way to enforce that. ;)
If we want to release at any time, nightly tests will help for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this compatibility testing as good to have for sure but:
The state of current nightly tests in general is not satisfable. They are not stable for a quite long time and so looks broken.
#399
That's why extending it with any new functionality is clearly no go from my side.
@rm3l The point about not having "real cluster tests" on CI (except ones from nightly) is valid and we have to solve it ASAP.
After that I do not see any reason to keep nightly jobs until we are sure we are able properly maintain it and see real needs in it.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I already stated that they are not passing (only since the past few days) because of #390 which was recently merged. The issue (thankfully caught quickly by the nightly checks) is being tracked in this JIRA: https://issues.redhat.com/browse/RHIDP-3157 Anyway, let's bring this discussion in one of our WG calls, to have more opinions on how to approach this. /hold |
Looks like we are talking different issues. Take a look here: https://github.com/janus-idp/operator/actions?page=2&query=event%3Aschedule And you can see that majority of Nightly builds are failed.
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Description
To help us detect potential issues in advance, this expands the E2E configuration matrix to test several versions of K8s:
Which issue(s) does this PR fix or relate to
—
PR acceptance criteria
rhdh-operator.csv.yaml
file accordinglyHow to test changes / Special notes to the reviewer
Example of run here: https://github.com/rm3l/janus-idp-operator/actions/runs/9809691738
The failures are not related to the changes here, but to an issue with the default
janus-idp/backstage-showcase
image'slatest
tag being outdated, and recently-merged #390 only works in 1.2+. See https://redhat-internal.slack.com/archives/C04CUSD4JSG/p1720191475569659