Skip to content

Dev: migration: implement corosync.conf migration for corosync 3 (jsc#PED-8252) #1422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Mar 12, 2025

Conversation

nicholasyang2022
Copy link
Collaborator

@nicholasyang2022 nicholasyang2022 commented May 16, 2024

This pull request implements migrating corosync.conf from corosync 2 to corosync 3, including:

  • migrate deprecated transport udpu and udp to knet
  • replace removed feature RRP with knet multilink
  • if crypto_hash is set to sha1, which is the default value in corosync 2, upgrades it to the new default in corosync 3, sha256.

Use Cases

Check When All Green for Migration

> sudo crm cluster health sles16
------ node: localhost ------
[INFO] Checking dependency version...
[INFO] Checking used corosync features...
[WARN] Corosync transport "udpu" will be deprecated in corosync 3. Please use knet.

------ cib ------
[WARN] The CIB is not validated with the latest schema version.
       * Latest version:  3.10
       * Current version: 3.9
[INFO] Checking used resource agents...

------ node: ha-2-2 ------
[WARN] Corosync transport "udpu" will be deprecated in corosync 3. Please use knet.

****** summary ******
[INFO] Please run "crm cluster health sles16 --fix" on on any one of above nodes.
[PASS] This cluster is good to migrate to SLES 16.

Check When Already Migrated

> sudo crm cluster health sles16
------ node: localhost ------
[INFO] Checking dependency version...
[INFO] Checking used corosync features...

------ cib ------
[INFO] Checking used resource agents...

------ node: ha-2-2 ------

****** summary ******
[INFO] This cluster works on SLES 16. No migration is needed.

Run Migration

> sudo crm cluster health sles16 --fix
suse@ha-2-1:~> sudo crm cluster health sles16 --fix
------ node: localhost ------
[INFO] Checking dependency version...
[INFO] Checking used corosync features...
[WARN] Corosync transport "udpu" will be deprecated in corosync 3. Please use knet.

------ cib ------
[WARN] The CIB is not validated with the latest schema version.
       * Latest version:  3.10
       * Current version: 3.9
[INFO] Checking used resource agents...

------ node: ha-2-2 ------
[WARN] Corosync transport "udpu" will be deprecated in corosync 3. Please use knet.

INFO: Starting migration...
INFO: Migrating corosync configuration...
INFO: Upgrade totem.transport to knet.
INFO: Upgrade totem.crypto_hash from "sha1" to "sha256".
INFO: Finish migrating corosync configuration. The original configuration is renamed to corosync.conf.bak
INFO: Finished migration.

Try to Run Migration When Already Migrated

suse@ha-2-1:~> sudo crm cluster health sles16 --fix
------ node: localhost ------
[INFO] Checking dependency version...
[INFO] Checking used corosync features...

------ cib ------
[WARN] The CIB is not validated with the latest schema version.
       * Latest version:  3.10
       * Current version: 3.9
[INFO] Checking used resource agents...

------ node: ha-2-2 ------

INFO: This cluster works on SLES 16 with some warnings. Please fix the remaining warnings manually.

Copy link

codecov bot commented May 16, 2024

Codecov Report

Attention: Patch coverage is 82.65896% with 90 lines in your changes missing coverage. Please review.

Project coverage is 69.97%. Comparing base (0f77021) to head (2c7b93f).
Report is 49 commits behind head on master.

Files with missing lines Patch % Lines
crmsh/migration.py 82.37% 77 Missing ⚠️
crmsh/ui_cluster.py 73.68% 10 Missing ⚠️
crmsh/cibquery.py 90.00% 3 Missing ⚠️
Additional details and impacted files
Flag Coverage Δ
integration 54.16% <82.08%> (+0.55%) ⬆️
unit 52.61% <27.36%> (-0.48%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
crmsh/corosync.py 87.64% <100.00%> (ø)
crmsh/prun/runner.py 64.86% <100.00%> (+0.97%) ⬆️
crmsh/sh.py 92.50% <ø> (+0.41%) ⬆️
crmsh/utils.py 66.33% <100.00%> (-0.16%) ⬇️
crmsh/xmlutil.py 69.24% <100.00%> (+0.03%) ⬆️
crmsh/cibquery.py 90.00% <90.00%> (ø)
crmsh/ui_cluster.py 74.81% <73.68%> (-0.63%) ⬇️
crmsh/migration.py 82.37% <82.37%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nicholasyang2022 nicholasyang2022 force-pushed the ped-8252-20240516 branch 5 times, most recently from 3c620cf to 2d4c6db Compare May 27, 2024 08:05
@nicholasyang2022 nicholasyang2022 force-pushed the ped-8252-20240516 branch 2 times, most recently from fdcdc65 to c6cd605 Compare October 23, 2024 03:52
@nicholasyang2022 nicholasyang2022 force-pushed the ped-8252-20240516 branch 3 times, most recently from 04ea10f to 69020cc Compare November 1, 2024 04:21
@nicholasyang2022 nicholasyang2022 force-pushed the ped-8252-20240516 branch 5 times, most recently from fa10732 to 68bd445 Compare January 2, 2025 09:14
@liangxin1300
Copy link
Collaborator

In

# crm cluster health sles16
------ localhost ------
[INFO] Checking dependency version...
[FAIL] Pacemaker version not supported
       Supported version: 3 <= Pacemaker
       Actual version:    Pacemaker == 2.1.9

Supported version: 3 <= Pacemaker better changed similar as Supported version: Pacemaker >= 3

@liangxin1300
Copy link
Collaborator

# crm cluster health sles16
------ localhost ------
[INFO] Checking dependency version...
[FAIL] Pacemaker version not supported
       Supported version: 3 <= Pacemaker
       Actual version:    Pacemaker == 2.1.9
[INFO] Checking service status...
[FAIL] Cluster services are running
       * corosync
       * pacemaker
[INFO] Checking used corosync features...

------ alp-3 ------



------ alp-2 ------

No output for remote nodes

@zzhou1
Copy link
Contributor

zzhou1 commented Jan 6, 2025

[FAIL] Cluster services are running
* corosync
* pacemaker

FAIL as the fatal error sounds too strong to me.

No matter it is corosync2 or corosync3 environment, nothing wrong to run crm cluster health sles16 no matter the cluster is running or not.

@nicholasyang2022 nicholasyang2022 force-pushed the ped-8252-20240516 branch 2 times, most recently from 486020b to 16cd739 Compare January 23, 2025 07:31
@nicholasyang2022 nicholasyang2022 force-pushed the ped-8252-20240516 branch 6 times, most recently from 9e1b95b to 38d9896 Compare February 13, 2025 03:18
…PED-11808)

should use recursive queries

(cherry picked from commit 4b7e42d)
…ilesystem_with_fstype

(cherry picked from commit 48aca45)
as it is not a hard requirement to upgrade pacemaker 2 to 3.
…jsc#PED-8252)

* should not run the migration if already migrated
* should not tell users to run `crm cluster health sles16 --fix` if
  already migrated
so that we can report an error without blocking migration.
…PED-8252)

No matter the cluster is running or not, there is nothing wrong to run
the check or fix.
… some non-fatal problems which need manual fix (jsc#PED-8252)

also refactor the return code of _check_impl()
…cluster (jsc#PED-8252)

crmsh generates incorrect bindnetaddr when joining a corosync 2 multicast cluster. This should be fixed before used in nodelist.
Copy link
Collaborator

@liangxin1300 liangxin1300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work
Thanks!

@liangxin1300 liangxin1300 merged commit ee7aaf4 into ClusterLabs:master Mar 12, 2025
33 checks passed
@nicholasyang2022 nicholasyang2022 deleted the ped-8252-20240516 branch May 9, 2025 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants