Skip to content

[release-4.18] OCPBUGS-54594: update bootloader on aarch64 systems #1795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dustymabe
Copy link
Member

The aarch64 kernel changed the file format [1] [2] and older
RHEL8 based systems (4.12 and 4.11) need to update the bootloader
otherwise the system won't boot when they get upgraded to 4.19
based on RHEL 9.6.

Let's add a systemd unit here that will update the bootloader.
Also need to add code that will handle the RAID case because
bootupd doesn't currently handle that case.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2162369
[2] https://issues.redhat.com/browse/RHEL-25537

Fixes: https://issues.redhat.com/browse/OCPBUGS-54594

Copy link
Contributor

openshift-ci bot commented Apr 8, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 8, 2025
@openshift-ci-robot
Copy link

@dustymabe: This pull request references Jira Issue OCPBUGS-54594, which is invalid:

  • expected the bug to target the "4.18.z" version, but no target version was set
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-54594 to depend on a bug targeting a version in 4.19.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The aarch64 kernel changed the file format [1] [2] and older
RHEL8 based systems (4.12 and 4.11) need to update the bootloader
otherwise the system won't boot when they get upgraded to 4.19
based on RHEL 9.6.

Let's add a systemd unit here that will update the bootloader.
Also need to add code that will handle the RAID case because
bootupd doesn't currently handle that case.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2162369
[2] https://issues.redhat.com/browse/RHEL-25537

Fixes: https://issues.redhat.com/browse/OCPBUGS-54594

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dustymabe
Copy link
Member Author

creating as draft because I need to do some more testing on this.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2025
@HuijingHei
Copy link
Contributor

Also need to add code that will handle the RAID case because
bootupd doesn't currently handle that case.

Actually coreos/bootupd#855 is to fix that, would you like to review when you have time? Thanks!

@dustymabe dustymabe force-pushed the dusty-OCPBUGS-54594 branch from 2ea6f99 to 9695bfb Compare April 9, 2025 16:03
@dustymabe dustymabe force-pushed the dusty-OCPBUGS-54594 branch from d2eec2d to d6e2ead Compare April 9, 2025 21:34
@dustymabe dustymabe marked this pull request as ready for review April 9, 2025 22:10
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2025
@openshift-ci openshift-ci bot requested review from cgwalters and jlebon April 9, 2025 22:10
@dustymabe
Copy link
Member Author

ok this is ready for review now.

@dustymabe
Copy link
Member Author

Also need to add code that will handle the RAID case because
bootupd doesn't currently handle that case.

Actually coreos/bootupd#855 is to fix that, would you like to review when you have time? Thanks!

correct! unfortunately we needed this to go into 4.18 to fix the bootloader for aarch64 systems before they attempt to upgrade to 4.19 so we couldn't use your new work, but next time!

@dustymabe
Copy link
Member Author

/label backport-risk-assessed
/label cherry-pick-approved

@openshift-ci openshift-ci bot added backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. labels Apr 11, 2025
Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
Let's make sure we get a lot of testing on this one.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2025
@dustymabe dustymabe force-pushed the dusty-OCPBUGS-54594 branch from d6e2ead to e85157a Compare April 14, 2025 20:39
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2025
@HuijingHei
Copy link
Contributor

Build 4.18-9.4 with the patch on aarch64, start 412.86.202402272018-0 and upgrade to the built version, do testing with 2 scenarios according to pointer from Timothee and Dusty, and result looks good.

  • RHCOS system with RAID1

a) Start 412 with mirror setup( refer to doc ), hit issue, workaround is remove bios- part in the ignition config

b) After upgrade, and check bootupctl status output is inconsistent (as /boot/bootupd-state.json is not synced), actually it did already update.

  • RHCOS system with classic filesystem layout but with more than one ESP (might need manual setup on another disk for example)

a) Start 412 with an additional disk, and setup the same ESP label EFI-SYSTEM on the second disk.

echo 'label: gpt
size=512M, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, name=EFI-SYSTEM
size=1G, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, name=boot
type=0FC63DAF-8483-4772-8E79-3D69D8477DE4' | sudo sfdisk /dev/vda

mkfs.fat -F32 /dev/vda1
mkfs.ext4 /dev/vda2
mkfs.ext4 /dev/vda3

mount /dev/vda1 /media/
mount /dev/vdb2 /boot/efi
cp -ar /boot/efi/* /media/
umount /media

[root@cosa-devsh core]# udevadm info /dev/vda1 | grep '^S: disk/by-partlabel'
S: disk/by-partlabel/EFI-SYSTEM

[root@cosa-devsh core]# udevadm info /dev/vdb2 | grep '^S: disk/by-partlabel'
S: disk/by-partlabel/EFI-SYSTEM

b) After upgrade, check coreos-bootupctl-update-aarch64.service run successfully

[core@cosa-devsh ~]$ sudo journalctl -b -u coreos-bootupctl-update-aarch64.service
Apr 16 10:52:37 localhost systemd[1]: Starting Update Bootloader for aarch64 systems...
Apr 16 10:52:37 localhost coreos-update-bootloader[1036]: Found ESP; calling 'bootupctl update'
Apr 16 10:52:37 cosa-devsh coreos-update-bootloader[1080]: Updated EFI: grub2-efi-aa64-1:2.06-86.el9_4.2.aarch64,shim-aa64-15.8-4.el9_3.aarch64
Apr 16 10:52:37 cosa-devsh systemd[1]: Finished Update Bootloader for aarch64 systems.

@dustymabe
Copy link
Member Author

RHCOS system with classic filesystem layout but with more than one ESP (might need manual setup on another disk for example)

So your test isn't 100% what I tested. Let me explain the difference.

Instead of manually partitioning a disk I just ran coreos-installer and targeted the second disk in the system. What you did should work too I think.

For some reason the output you are seeing isn't hitting the case that I think it should in the code. It should be manually copying and not using bootupctl in this case I think. I'm not sure why.

Can you try doing the steps where you partition the second disk when you are still in 4.12 (before you do the upgrade)?

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but looks sane overall. Nice work!


# Find the device the boot filesystem is mounted from
# (i.e. /dev/md126 or /dev/sda3).
boot_fs_device=$(findmnt --json --target /boot | jq -r .filesystems[0].source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't this just

Suggested change
boot_fs_device=$(findmnt --json --target /boot | jq -r .filesystems[0].source)
boot_fs_device=$(findmnt -no SOURCE /boot)

?

which is more commonplace I think in the rest of our code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update in next upload

boot_fs_partition_json=$(
jq --arg boot_fs_device "${boot_fs_device}" -r '
[
.blockdevices[].children[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be

Suggested change
.blockdevices[].children[]
.blockdevices[].children[] // []

in case there are devices that don't have children (e.g. unpartitioned disks).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it to:

.blockdevices[].children[]?

which handles the case where a block device doesn't have any children.

Comment on lines 39 to 41
(.fstype == "vfat") and
(.label != null) and
(.label | startswith("esp"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update in the next upload.

@HuijingHei
Copy link
Contributor

HuijingHei commented Apr 17, 2025

For some reason the output you are seeing isn't hitting the case that I think it should in the code. It should be manually copying and not using bootupctl in this case I think. I'm not sure why.

Sorry that for the misunderstanding, I tried again using coroes-installer to install the second disk, and the result looks sane to use manually copying (as see Falling back to manual copy), as it checks there are 2 ESP, and only manual copy to the disk that has mountpoint /boot which is what we expect.

When using manual setup, sometimes the service failed (not always):
lsblk --paths --fs --json output shows that /dev/vda2 gets partlabel is null, not EFI-SYSTEM (need to find out why), so only has 1 ESP and will call bootupctl update, if in this case /dev/disk/by-partlabel/EFI-SYSTEM -> /dev/vda2, then bootupctl would fail. But if /dev/disk/by-partlabel/EFI-SYSTEM -> /dev/vdb2, then bootupctl would success.

For bootupctl, we need to change that should not rely on partlabel EFI-SYSTEM in this case.

  • Before manual setup, the /dev/disk/by-partlabel/EFI-SYSTEM -> /dev/vdb2
[root@cosa-devsh core]# ls -al /dev/disk/by-partlabel/
total 0
drwxr-xr-x. 2 root root 120 Apr 17 03:16 .
drwxr-xr-x. 8 root root 160 Apr 17 03:16 ..
lrwxrwxrwx. 1 root root  10 Apr 17 03:16 EFI-SYSTEM -> ../../vdb2
lrwxrwxrwx. 1 root root  10 Apr 17 03:16 boot -> ../../vdb3
lrwxrwxrwx. 1 root root  10 Apr 17 03:16 reserved -> ../../vdb1
lrwxrwxrwx. 1 root root  10 Apr 17 03:16 root -> ../../vdb4
  • manual setup the diskpart as the same as the primary disk
sfdisk -d /dev/vdb > diskpart
sfdisk /dev/vda < diskpart

mkfs.fat -F16 /dev/vda2
mkfs.ext4 /dev/vda3
mkfs.xfs /dev/vda4
  • After manual setup, check /dev/disk/by-partlabel/EFI-SYSTEM -> /dev/vda2
[root@cosa-devsh core]# ls -al /dev/disk/by-partlabel/
total 0
drwxr-xr-x. 2 root root 120 Apr 17 06:42 .
drwxr-xr-x. 8 root root 160 Apr 17 06:40 ..
lrwxrwxrwx. 1 root root  10 Apr 17 06:42 EFI-SYSTEM -> ../../vda2
lrwxrwxrwx. 1 root root  10 Apr 17 06:42 boot -> ../../vda3
lrwxrwxrwx. 1 root root  10 Apr 17 06:42 reserved -> ../../vda1
lrwxrwxrwx. 1 root root  10 Apr 17 06:42 root -> ../../vda4

[root@cosa-devsh core]# blkid --cache-file /dev/null  /dev/vda2
/dev/vda2: SEC_TYPE="msdos" UUID="2F15-4870" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI-SYSTEM" PARTUUID="f8c5dddf-f696-409a-85bd-83c4746aa265"
  • Upgrade and reboot, check coreos-bootupctl-update-aarch64.service failed, and /dev/disk/by-partlabel/EFI-SYSTEM -> /dev/vda2
[root@cosa-devsh core]# rpm-ostree rebase --experimental ostree-unverified-image:oci-archive:/var/mnt/workdir/builds/9.4.202504170628-0/aarch64/rhcos-9.4.202504170628-0-ostree.aarch64.ociarchive

[root@cosa-devsh core]# reboot
Last login: Thu Apr 17 06:40:50 2025
[systemd]
Failed Units: 1
  coreos-bootupctl-update-aarch64.service

[core@cosa-devsh ~]$ journalctl -b -u coreos-bootupctl-update-aarch64.service
Apr 17 06:51:47 localhost systemd[1]: Starting Update Bootloader for aarch64 systems...
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + main
Apr 17 06:51:47 localhost coreos-update-bootloader[1051]: ++ lsblk --paths --fs --json
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + block_devices_json='{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:    "blockdevices": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:       {
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "name": "/dev/vda",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fstype": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fsver": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "label": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "uuid": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:              null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          ],
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "children": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             {
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vda1",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             },{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vda2",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": "vfat",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": "FAT16",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": "2F15-4870",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             },{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vda3",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": "ext4",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": "1.0",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": "7dd5db5e-ca8a-48b9-ab9f-5bf4bb5334d0",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             },{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vda4",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": "xfs",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": "21db17b1-6912-4b9a-8e3c-b6aebf747c07",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             }
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:       },{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "name": "/dev/vdb",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fstype": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fsver": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "label": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "uuid": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:              null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          ],
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          "children": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             {
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vdb1",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             },{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vdb2",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": "vfat",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": "FAT16",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": "EFI-SYSTEM",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": "063C-C7CF",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    null
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             },{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vdb3",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": "ext4",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": "1.0",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": "boot",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": "1f2160b6-a16f-4fd1-aaf2-92ee9cc15777",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": "155.7M",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": "49%",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    "/boot"
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             },{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "name": "/dev/vdb4",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fstype": "xfs",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsver": null,
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "label": "root",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "uuid": "5f6618f8-ddfd-4977-8fc7-8992d065df3d",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsavail": "10.3G",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "fsuse%": "33%",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                    "/var", "/sysroot/ostree/deploy/rhcos/var", "/sysroot", "/usr", "/etc", "/"
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:                ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:             }
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:          ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:       }
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:    ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: }'
Apr 17 06:51:47 localhost coreos-update-bootloader[1063]: ++ findmnt --json --target /boot
Apr 17 06:51:47 localhost coreos-update-bootloader[1066]: ++ jq -r '.filesystems[0].source'
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + boot_fs_device=/dev/vdb3
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]: ++ jq --arg boot_fs_device /dev/vdb3 -r '
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]:             [
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]:             .blockdevices[].children[]
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]:             | select(
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]:                 .name == $boot_fs_device or
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]:                 .children[]?.name == $boot_fs_device
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]:             )
Apr 17 06:51:47 localhost coreos-update-bootloader[1070]:             ] | .[0]'
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + boot_fs_partition_json='{
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "name": "/dev/vdb3",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "fstype": "ext4",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "fsver": "1.0",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "label": "boot",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "uuid": "1f2160b6-a16f-4fd1-aaf2-92ee9cc15777",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "fsavail": "155.7M",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "fsuse%": "49%",
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   "mountpoints": [
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:     "/boot"
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]:   ]
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: }'
Apr 17 06:51:47 localhost coreos-update-bootloader[1076]: ++ jq -r .fstype
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + boot_fs_partition_fstype=ext4
Apr 17 06:51:47 localhost coreos-update-bootloader[1083]: ++ jq -r '
Apr 17 06:51:47 localhost coreos-update-bootloader[1083]:         [
Apr 17 06:51:47 localhost coreos-update-bootloader[1083]:         .blockdevices[]
Apr 17 06:51:47 localhost coreos-update-bootloader[1083]:         | select(.children[]?.label == "EFI-SYSTEM")
Apr 17 06:51:47 localhost coreos-update-bootloader[1083]:         ] | length'
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + num_efi_system_devices=1
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + '[' ext4 == linux_raid_member ']'
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + '[' 1 -gt 1 ']'
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + echo 'Found ESP; calling '\''bootupctl update'\'''
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: Found ESP; calling 'bootupctl update'
Apr 17 06:51:47 localhost coreos-update-bootloader[1040]: + bootupctl update -vvvvv
Apr 17 06:51:47 localhost coreos-update-bootloader[1087]: [TRACE bootupd] executing cli
Apr 17 06:51:47 localhost.localdomain coreos-update-bootloader[1087]: error: internal error: Failed to update EFI: opening EFI dir: No such file or directory (os error 2)
Apr 17 06:51:47 localhost.localdomain systemd[1]: coreos-bootupctl-update-aarch64.service: Main process exited, code=exited, status=1/FAILURE
Apr 17 06:51:47 localhost.localdomain systemd[1]: coreos-bootupctl-update-aarch64.service: Failed with result 'exit-code'.
Apr 17 06:51:47 localhost.localdomain systemd[1]: Failed to start Update Bootloader for aarch64 systems.

[core@cosa-devsh ~]$ ls /dev/disk/by-partlabel/ -al
total 0
drwxr-xr-x. 2 root root 120 Apr 17 06:51 .
drwxr-xr-x. 9 root root 180 Apr 17 06:51 ..
lrwxrwxrwx. 1 root root  10 Apr 17 06:51 EFI-SYSTEM -> ../../vda2
lrwxrwxrwx. 1 root root  10 Apr 17 06:51 boot -> ../../vdb3
lrwxrwxrwx. 1 root root  10 Apr 17 06:51 reserved -> ../../vda1
lrwxrwxrwx. 1 root root  10 Apr 17 06:51 root -> ../../vda4

@dustymabe dustymabe force-pushed the dusty-OCPBUGS-54594 branch from e85157a to 2604ee6 Compare April 17, 2025 16:09
The aarch64 kernel changed the file format [1] [2] and older
RHEL8 based systems (4.12 and 4.11) need to update the bootloader
otherwise the system won't boot when they get upgraded to 4.19
based on RHEL 9.6.

Let's add a systemd unit here that will update the bootloader.
Also need to add code that will handle the RAID case because
bootupd doesn't currently handle that case.

It's worth mentioning that we did hit this upstream in Fedora
CoreOS as well [3] and we took this fix from that [4] and another
similar issue [5] as inspiration here.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2162369
[2] https://issues.redhat.com/browse/RHEL-25537
[3] coreos/fedora-coreos-tracker#1441
[4] coreos/fedora-coreos-config#2308
[5] coreos/fedora-coreos-config#3042

Fixes: https://issues.redhat.com/browse/OCPBUGS-54594
@dustymabe dustymabe force-pushed the dusty-OCPBUGS-54594 branch from 2604ee6 to 7401561 Compare April 17, 2025 16:18
@dustymabe
Copy link
Member Author

When using manual setup, sometimes the service failed (not always):
lsblk --paths --fs --json output shows that /dev/vda2 gets partlabel is null, not EFI-SYSTEM (need to find out why),

I just changed the script to not rely on label at all and use parttype instead.

Copy link
Contributor

openshift-ci bot commented Apr 17, 2025

@dustymabe: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jlebon
Copy link
Member

jlebon commented Apr 17, 2025

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2025
Copy link
Contributor

openshift-ci bot commented Apr 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dustymabe, jlebon, travier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [dustymabe,jlebon,travier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dustymabe
Copy link
Member Author

I finished testing on this and it looks good.

@HuijingHei
Copy link
Contributor

When using manual setup, sometimes the service failed (not always):
lsblk --paths --fs --json output shows that /dev/vda2 gets partlabel is null, not EFI-SYSTEM (need to find out why),

I just changed the script to not rely on label at all and use parttype instead.

Still can not find why lsblk --paths --fs --json output shows that /dev/vda2 gets partlabel is null, anyway parttype works.

Do testing about following scenarios, and result looks good.

  • RHCOS system with classic filesystem layout
    --- call bootupctl

  • RHCOS system with RAID1
    --- call update_raid_esp()

  • RHCOS system with classic filesystem layout but with more than one ESP (use coreos-installer).
    --- Falling back to manual copy, call copy_to_esp_device()

  • RHCOS system with classic filesystem layout but with more than one ESP (use manual setup).
    --- Falling back to manual copy, call copy_to_esp_device()

.blockdevices[]
| select(.children[]?.name == $boot_fs_device)
| .children[]
| select(.parttype == "c12a7328-f81f-11d2-ba4b-00a0c93ec93b")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just minor suggestion, can we make this as variable like this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion.

I think in this case since we've already tested the code as is and this script is only in RHCOS for 4.18 let's save some effort and not change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dustymabe
Copy link
Member Author

Thank you for testing everything @HuijingHei.

Now all we need is to figure out how to satisfy the bot for jira and then this can merge.

@mike-nguyen
Copy link
Member

/jira refresh

@openshift-ci-robot
Copy link

@mike-nguyen: This pull request references Jira Issue OCPBUGS-54594, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mike-nguyen
Copy link
Member

@dustymabe I fixed most of it. The release note for the bug is the only thing left so if you could set it this will merge. Thanks for your work on this!

@dustymabe
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 18, 2025
@openshift-ci-robot
Copy link

@dustymabe: This pull request references Jira Issue OCPBUGS-54594, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.z) matches configured target version for branch (4.18.z)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-55144 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-55144 targets the "4.19.0" version, which is one of the valid target versions: 4.19.0
  • bug has dependents

Requesting review from QA contact:
/cc @mike-nguyen

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from mike-nguyen April 18, 2025 13:02
@dustymabe
Copy link
Member Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 63b5e6c into openshift:release-4.18 Apr 18, 2025
5 checks passed
@openshift-ci-robot
Copy link

@dustymabe: Jira Issue OCPBUGS-54594: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-54594 has been moved to the MODIFIED state.

In response to this:

The aarch64 kernel changed the file format [1] [2] and older
RHEL8 based systems (4.12 and 4.11) need to update the bootloader
otherwise the system won't boot when they get upgraded to 4.19
based on RHEL 9.6.

Let's add a systemd unit here that will update the bootloader.
Also need to add code that will handle the RAID case because
bootupd doesn't currently handle that case.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2162369
[2] https://issues.redhat.com/browse/RHEL-25537

Fixes: https://issues.redhat.com/browse/OCPBUGS-54594

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants