-
Notifications
You must be signed in to change notification settings - Fork 122
[release-4.18] OCPBUGS-54594: update bootloader on aarch64 systems #1795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.18] OCPBUGS-54594: update bootloader on aarch64 systems #1795
Conversation
Skipping CI for Draft Pull Request. |
@dustymabe: This pull request references Jira Issue OCPBUGS-54594, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
creating as draft because I need to do some more testing on this. |
Actually coreos/bootupd#855 is to fix that, would you like to review when you have time? Thanks! |
2ea6f99
to
9695bfb
Compare
d2eec2d
to
d6e2ead
Compare
ok this is ready for review now. |
correct! unfortunately we needed this to go into 4.18 to fix the bootloader for aarch64 systems before they attempt to upgrade to 4.19 so we couldn't use your new work, but next time! |
/label backport-risk-assessed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
Let's make sure we get a lot of testing on this one.
d6e2ead
to
e85157a
Compare
Build 4.18-9.4 with the patch on aarch64, start 412.86.202402272018-0 and upgrade to the built version, do testing with 2 scenarios according to pointer from Timothee and Dusty, and result looks good.
a) Start 412 with mirror setup( refer to doc ), hit issue, workaround is remove b) After upgrade, and check
a) Start 412 with an additional disk, and setup the same ESP label
b) After upgrade, check
|
So your test isn't 100% what I tested. Let me explain the difference. Instead of manually partitioning a disk I just ran For some reason the output you are seeing isn't hitting the case that I think it should in the code. It should be manually copying and not using Can you try doing the steps where you partition the second disk when you are still in 4.12 (before you do the upgrade)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, but looks sane overall. Nice work!
|
||
# Find the device the boot filesystem is mounted from | ||
# (i.e. /dev/md126 or /dev/sda3). | ||
boot_fs_device=$(findmnt --json --target /boot | jq -r .filesystems[0].source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, isn't this just
boot_fs_device=$(findmnt --json --target /boot | jq -r .filesystems[0].source) | |
boot_fs_device=$(findmnt -no SOURCE /boot) |
?
which is more commonplace I think in the rest of our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update in next upload
boot_fs_partition_json=$( | ||
jq --arg boot_fs_device "${boot_fs_device}" -r ' | ||
[ | ||
.blockdevices[].children[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be
.blockdevices[].children[] | |
.blockdevices[].children[] // [] |
in case there are devices that don't have children (e.g. unpartitioned disks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it to:
.blockdevices[].children[]?
which handles the case where a block device doesn't have any children.
(.fstype == "vfat") and | ||
(.label != null) and | ||
(.label | startswith("esp")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more canonical check here is to just check the partition type GUID. See https://github.com/coreos/fedora-coreos-config/blob/350e4d59c2ca8dd84419982eb51910961b095f86/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh#L5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update in the next upload.
Sorry that for the misunderstanding, I tried again using When using manual setup, sometimes the service failed (not always): For bootupctl, we need to change that should not rely on partlabel
|
e85157a
to
2604ee6
Compare
The aarch64 kernel changed the file format [1] [2] and older RHEL8 based systems (4.12 and 4.11) need to update the bootloader otherwise the system won't boot when they get upgraded to 4.19 based on RHEL 9.6. Let's add a systemd unit here that will update the bootloader. Also need to add code that will handle the RAID case because bootupd doesn't currently handle that case. It's worth mentioning that we did hit this upstream in Fedora CoreOS as well [3] and we took this fix from that [4] and another similar issue [5] as inspiration here. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2162369 [2] https://issues.redhat.com/browse/RHEL-25537 [3] coreos/fedora-coreos-tracker#1441 [4] coreos/fedora-coreos-config#2308 [5] coreos/fedora-coreos-config#3042 Fixes: https://issues.redhat.com/browse/OCPBUGS-54594
2604ee6
to
7401561
Compare
I just changed the script to not rely on label at all and use parttype instead. |
@dustymabe: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dustymabe, jlebon, travier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I finished testing on this and it looks good. |
Still can not find why Do testing about following scenarios, and result looks good.
|
.blockdevices[] | ||
| select(.children[]?.name == $boot_fs_device) | ||
| .children[] | ||
| select(.parttype == "c12a7328-f81f-11d2-ba4b-00a0c93ec93b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just minor suggestion, can we make this as variable like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion.
I think in this case since we've already tested the code as is and this script is only in RHCOS for 4.18 let's save some effort and not change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you for testing everything @HuijingHei. Now all we need is to figure out how to satisfy the bot for jira and then this can merge. |
/jira refresh |
@mike-nguyen: This pull request references Jira Issue OCPBUGS-54594, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dustymabe I fixed most of it. The release note for the bug is the only thing left so if you could set it this will merge. Thanks for your work on this! |
/jira refresh |
@dustymabe: This pull request references Jira Issue OCPBUGS-54594, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/unhold |
63b5e6c
into
openshift:release-4.18
@dustymabe: Jira Issue OCPBUGS-54594: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-54594 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The aarch64 kernel changed the file format [1] [2] and older
RHEL8 based systems (4.12 and 4.11) need to update the bootloader
otherwise the system won't boot when they get upgraded to 4.19
based on RHEL 9.6.
Let's add a systemd unit here that will update the bootloader.
Also need to add code that will handle the RAID case because
bootupd doesn't currently handle that case.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2162369
[2] https://issues.redhat.com/browse/RHEL-25537
Fixes: https://issues.redhat.com/browse/OCPBUGS-54594