Skip to content

MCO-1615: Mco node degraded mcn condition #29684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pablintino
Copy link

The MCN API has a condition to signal a failure durint the update of a MachineConfigNode. The condition was already in the API but the MCO was not handling it.
With the MCO now handling the condition our testing now needs to check it is properly set/unset when needed.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2025
Copy link
Contributor

openshift-ci bot commented Apr 15, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@pablintino pablintino changed the title Mco node degraded mcn condition MCO-1615: Mco node degraded mcn condition Apr 15, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 15, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 15, 2025

@pablintino: This pull request references MCO-1615 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

The MCN API has a condition to signal a failure durint the update of a MachineConfigNode. The condition was already in the API but the MCO was not handling it.
With the MCO now handling the condition our testing now needs to check it is properly set/unset when needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 19, 2025
@pablintino pablintino force-pushed the mco-node-degraded-mcn-condition branch from 27b2923 to 3bd651f Compare April 21, 2025 10:23
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2025
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2025
@pablintino pablintino marked this pull request as ready for review April 21, 2025 10:23
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2025
@openshift-ci openshift-ci bot requested review from cheesesashimi and djoshy April 21, 2025 10:25
@isabella-janssen
Copy link
Member

isabella-janssen commented Apr 21, 2025

/test e2e-aws-ovn-microshift

Running this test to see the stability of it.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 27, 2025
@pablintino pablintino force-pushed the mco-node-degraded-mcn-condition branch from 3bd651f to 52f6ff9 Compare April 28, 2025 09:59
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 28, 2025
@pablintino
Copy link
Author

/retest-required

Copy link

openshift-trt bot commented Apr 29, 2025

Job Failure Risk Analysis for sha: 52f6ff9

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws IncompleteTests
Tests for this run (17) are below the historical average (2320): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (17) are below the historical average (1383): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (17) are below the historical average (1058): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn IncompleteTests
Tests for this run (17) are below the historical average (2424): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 IncompleteTests
Tests for this run (17) are below the historical average (2012): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones IncompleteTests
Tests for this run (19) are below the historical average (1991): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling IncompleteTests
Tests for this run (17) are below the historical average (1058): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (17) are below the historical average (1821): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (17) are below the historical average (1483): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (15) are below the historical average (1278): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (15) are below the historical average (695): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (17) are below the historical average (1213): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (17) are below the historical average (1140): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node IncompleteTests
Tests for this run (17) are below the historical average (1796): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial IncompleteTests
Tests for this run (17) are below the historical average (1323): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (18) are below the historical average (3421): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-upgrade IncompleteTests
Tests for this run (19) are below the historical average (1504): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-proxy IncompleteTests
Tests for this run (18) are below the historical average (2553): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (18) are below the historical average (4005): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (18) are below the historical average (3032): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Showing 20 of 26 jobs analysis

The MCN API has a condition to signal a failure durint the update of
a MachineConfigNode. The condition was already in the API but the MCO
was not handling it. With the MCO now handling the condition our testing
now needs to check it is properly set/unset when needed.
@pablintino pablintino force-pushed the mco-node-degraded-mcn-condition branch from 52f6ff9 to 16317c0 Compare April 29, 2025 09:02
@pablintino
Copy link
Author

/retest

@pablintino
Copy link
Author

/test e2e-aws-ovn-serial

Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

@pablintino: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aws-jenkins
/test e2e-aws-ovn-edge-zones
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-image-registry
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-ovn
/test e2e-gcp-ovn-builds
/test e2e-gcp-ovn-image-ecosystem
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test images
/test lint
/test okd-scos-images
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
/test e2e-agnostic-ovn-cmd
/test e2e-aws
/test e2e-aws-csi
/test e2e-aws-disruptive
/test e2e-aws-etcd-certrotation
/test e2e-aws-etcd-recovery
/test e2e-aws-ovn
/test e2e-aws-ovn-cgroupsv2
/test e2e-aws-ovn-etcd-scaling
/test e2e-aws-ovn-ipsec-serial
/test e2e-aws-ovn-kube-apiserver-rollout
/test e2e-aws-ovn-kubevirt
/test e2e-aws-ovn-single-node
/test e2e-aws-ovn-single-node-serial
/test e2e-aws-ovn-single-node-techpreview
/test e2e-aws-ovn-single-node-techpreview-serial
/test e2e-aws-ovn-single-node-upgrade
/test e2e-aws-ovn-upgrade
/test e2e-aws-ovn-upgrade-rollback
/test e2e-aws-ovn-upi
/test e2e-aws-ovn-virt-techpreview
/test e2e-aws-proxy
/test e2e-azure
/test e2e-azure-ovn-etcd-scaling
/test e2e-azure-ovn-upgrade
/test e2e-baremetalds-kubevirt
/test e2e-external-aws
/test e2e-external-aws-ccm
/test e2e-external-vsphere-ccm
/test e2e-gcp-csi
/test e2e-gcp-disruptive
/test e2e-gcp-fips-serial
/test e2e-gcp-ovn-etcd-scaling
/test e2e-gcp-ovn-rt-upgrade
/test e2e-gcp-ovn-techpreview
/test e2e-gcp-ovn-techpreview-serial
/test e2e-gcp-ovn-usernamespace
/test e2e-hypershift-conformance
/test e2e-metal-ipi-ovn
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview
/test e2e-metal-ipi-ovn-dualstack-bgp-techpreview
/test e2e-metal-ipi-ovn-dualstack-local-gateway
/test e2e-metal-ipi-ovn-kube-apiserver-rollout
/test e2e-metal-ipi-serial
/test e2e-metal-ipi-serial-ovn-ipv6
/test e2e-metal-ipi-virtualmedia
/test e2e-metal-ovn-single-node-live-iso
/test e2e-metal-ovn-single-node-with-worker-live-iso
/test e2e-metal-ovn-two-node-arbiter
/test e2e-openstack-ovn
/test e2e-openstack-serial
/test e2e-vsphere-ovn-dualstack-primaryv6
/test e2e-vsphere-ovn-etcd-scaling
/test okd-e2e-gcp
/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
pull-ci-openshift-origin-main-e2e-agnostic-ovn-cmd
pull-ci-openshift-origin-main-e2e-aws
pull-ci-openshift-origin-main-e2e-aws-csi
pull-ci-openshift-origin-main-e2e-aws-disruptive
pull-ci-openshift-origin-main-e2e-aws-ovn
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-aws-ovn-fips
pull-ci-openshift-origin-main-e2e-aws-ovn-kube-apiserver-rollout
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade
pull-ci-openshift-origin-main-e2e-aws-ovn-upgrade
pull-ci-openshift-origin-main-e2e-aws-proxy
pull-ci-openshift-origin-main-e2e-azure
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade
pull-ci-openshift-origin-main-e2e-gcp-csi
pull-ci-openshift-origin-main-e2e-gcp-disruptive
pull-ci-openshift-origin-main-e2e-gcp-fips-serial
pull-ci-openshift-origin-main-e2e-gcp-ovn
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade
pull-ci-openshift-origin-main-e2e-hypershift-conformance
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-bgp-techpreview
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-local-gateway
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout
pull-ci-openshift-origin-main-e2e-metal-ipi-serial
pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6
pull-ci-openshift-origin-main-e2e-metal-ipi-virtualmedia
pull-ci-openshift-origin-main-e2e-openstack-ovn
pull-ci-openshift-origin-main-e2e-openstack-serial
pull-ci-openshift-origin-main-e2e-vsphere-ovn
pull-ci-openshift-origin-main-e2e-vsphere-ovn-dualstack-primaryv6
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi
pull-ci-openshift-origin-main-images
pull-ci-openshift-origin-main-lint
pull-ci-openshift-origin-main-okd-e2e-gcp
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn
pull-ci-openshift-origin-main-okd-scos-images
pull-ci-openshift-origin-main-unit
pull-ci-openshift-origin-main-verify
pull-ci-openshift-origin-main-verify-deps

In response to this:

/test e2e-aws-ovn-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@djoshy
Copy link
Contributor

djoshy commented Apr 30, 2025

/lgtm

This looks good to me, but the test isn't part of a suite. I'm happy to merge as is, and evaluate issues as they arise when we create a suite for this.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2025
Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, pablintino

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@isabella-janssen
Copy link
Member

LGTM too! Thanks @pablintino

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD ce68c31 and 2 for PR HEAD 16317c0 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD ce68c31 and 2 for PR HEAD 16317c0 in total

Copy link
Contributor

openshift-ci bot commented May 6, 2025

@pablintino: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade 16317c0 link false /test e2e-azure-ovn-upgrade
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 16317c0 link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/e2e-gcp-fips-serial 16317c0 link false /test e2e-gcp-fips-serial
ci/prow/e2e-aws-ovn-etcd-scaling 16317c0 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/okd-e2e-gcp 16317c0 link false /test okd-e2e-gcp
ci/prow/e2e-aws-disruptive 16317c0 link false /test e2e-aws-disruptive
ci/prow/e2e-openstack-serial 16317c0 link false /test e2e-openstack-serial
ci/prow/e2e-gcp-ovn-etcd-scaling 16317c0 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/okd-scos-e2e-aws-ovn 16317c0 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-gcp-disruptive 16317c0 link false /test e2e-gcp-disruptive
ci/prow/e2e-azure-ovn-etcd-scaling 16317c0 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 16317c0 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-vsphere-ovn-etcd-scaling 16317c0 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-serial 16317c0 link true /test e2e-aws-ovn-serial
ci/prow/e2e-metal-ipi-ovn-ipv6 16317c0 link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented May 6, 2025

Job Failure Risk Analysis for sha: 16317c0

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive Medium
[bz-Etcd] clusteroperator/etcd should not change condition/Available
Potential external regression detected for High Risk Test analysis
---
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-aws-ovn-serial IncompleteTests
Tests for this run (105) are below the historical average (1371): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-disruptive High
[sig-node] static pods should start after being created
This test has passed 98.74% of 5014 runs on release 4.20 [Overall] in the last week.
---
[sig-arch][Late] operators should not create watch channels very often
This test has passed 99.52% of 4833 runs on release 4.20 [Overall] in the last week.

Open Bugs
ResilientWatchCacheInitialization (Re)enablement - operator watch counts from component readiness
operators should not create watch channels very often regression
---
[bz-Monitoring] clusteroperator/monitoring should not change condition/Degraded
This test has passed 98.56% of 5014 runs on release 4.20 [Overall] in the last week.
---
[bz-Monitoring] clusteroperator/monitoring should not change condition/Available
This test has passed 98.78% of 5014 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling High
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
This test has passed 99.96% of 4937 runs on release 4.20 [Overall] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling IncompleteTests
Tests for this run (22) are below the historical average (1488): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@pablintino
Copy link
Author

/retest-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants