Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement systemd-boot boot assessment #2864

Closed
Tracked by #2127
jimmykarily opened this issue Sep 17, 2024 · 13 comments
Closed
Tracked by #2127

Implement systemd-boot boot assessment #2864

jimmykarily opened this issue Sep 17, 2024 · 13 comments
Assignees

Comments

@jimmykarily
Copy link
Contributor

jimmykarily commented Sep 17, 2024

systemd-boot has a was to perform boot assessment and fallback to other entries if booting fails. It is described in detail here and here. It's not very complicated and only requires us to name the conf/efi files in a certain way and also make sure we order entries properly (so that the right one is picked as a fallback).

Note:
Originally investigated while documenting how Kairos does boot assessment,

@bencorrado
Copy link
Contributor

I can help test this when someone is ready for testing.

I was also thinking about how does the system move from failed active AND passive into recovery or reset.

Right now recovery requires human intervention and doesn't load any sysext options, so it has to be pretty bare bones as we are keeping UKI images small. I was thinking about building an auto update script for recovery that runs and tries to fix active/passive by running an upgrade and/or checks a HTTPS website for instructions. It would then not auto update the systemd-boot count for recovery, and instead let active/passive successfully booting reset the count for recovery. This would make sure that if recovery fails to recover the system after X attempts, a reset is triggered which hopefully can do a better job setting every right and blowing away filesystems to clean it up.

@jimmykarily
Copy link
Contributor Author

Planning decision:

Let's implement the default fallback mechanism of systemd first and then see if we can implement the auto-reset feature using stages and such (extract to different ticket when the first part is done)

Being able to auto-reset a system that doesn't boot make sense, especially in cases like:

@mudler mudler moved this from Todo 🖊 to Under review 🔍 in 🧙Issue tracking board Nov 18, 2024
@Itxaka Itxaka self-assigned this Nov 20, 2024
@Itxaka
Copy link
Member

Itxaka commented Nov 21, 2024

with the given patch it seems to work BUT

  • we are missing the systemd-bless-boot service and binary which changes the tries left/used so after 3 boots the entries are marked as bad
  • even if we make that work, it will not work because we mount the efi partition RO

2 possible outcomes:

  • Mount EFI as RW during initramfs, remount it as RO at the end of the UKI boot process
  • Create our own service that remounts as RW, changes the current entry (mark good basically) and remounts RO

thoughts @kairos-io/maintainers

@Itxaka
Copy link
Member

Itxaka commented Nov 21, 2024

Basically this is the expected workflow of the boot assesment for reference: https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/

Important part below

Let’s say the second boot succeeds. The kernel initializes properly, systemd is started and invokes all generators.

One of the generators started is systemd-bless-boot-generator which detects that boot counting is used.
It hence pulls systemd-bless-boot.service into the initial transaction.

systemd-bless-boot.service is ordered after and Requires= the generic boot-complete.target unit.
 This unit is hence also pulled into the initial transaction.

The boot-complete.target unit is ordered after and pulls in various units that are required to succeed for the boot process to be considered successful. 
One such unit is systemd-boot-check-no-failures.service.


systemd-boot-check-no-failures.service is run after all its own dependencies completed, and assesses that the boot completed successfully. It hence exits cleanly.

This allows boot-complete.target to be reached. This signifies to the system that this boot attempt shall be considered successful.

Which in turn permits systemd-bless-boot.service to run. It now determines which boot loader entry file was used to boot the system, and renames it dropping the counter tag. Thus 4.14.11-300.fc27.x86_64+1-2.conf is renamed to 4.14.11-300.fc27.x86_64.conf. From this moment boot counting is turned off for this entry.

@Itxaka
Copy link
Member

Itxaka commented Nov 21, 2024

Mount EFI as RW during initramfs, remount it as RO at the end of the UKI boot process

I dont think this works for us, as we need to wait for the boot-complete.target which will happen in userspace instead of initramfs.

We could also have a manual service that runs after systemd multi-user.target

  • pre: mounts EFI as RW (its already mounted as RO)
  • runs the bless boot binary manually
  • post: remounts EFI as RO

@mudler
Copy link
Member

mudler commented Nov 21, 2024

with the given patch it seems to work BUT

* we are missing the systemd-bless-boot service and binary which changes the tries left/used so after 3 boots the entries are marked as bad

* even if we make that work, it will not work because we mount the efi partition RO

2 possible outcomes:

* Mount EFI as RW during initramfs, remount it as RO at the end of the UKI boot process

mmh complex but doable, the only challenge I see there is to fire the systemd services exactly in that timeframe, not sure if possible if not by calling systemd-bless-boot inside immucore

* Create our own service that remounts as RW, changes the current entry (mark good basically) and remounts RO

That looks the most saner solution at this point, however, my only concern here is if systemd-bless-boot will get more business logic from systemd that we might miss. Wouldn't be at this point equivalent to call systemd-bless-boot from immucore directly?

@Itxaka
Copy link
Member

Itxaka commented Nov 21, 2024

mmh complex but doable, the only challenge I see there is to fire the systemd services exactly in that timeframe, not sure if possible if not by calling systemd-bless-boot inside immucore

Yeah after a deeper checking this wont work as the bless is once the system is fully up, so in userspace once systemctl reports everything as running. Out of immucore control unfortunately

* Create our own service that remounts as RW, changes the current entry (mark good basically) and remounts RO

That looks the most saner solution at this point, however, my only concern here is if systemd-bless-boot will get more business logic from systemd that we might miss. Wouldn't be at this point equivalent to call systemd-bless-boot from immucore directly?

Seems like we may be able to do it ourselves by just calling the binary. So mimicking the bless service but with extra steps. Maybe even with a simple override to run pre and post for the mounts. So we dont need to reimplement the whole thing

@bencorrado
Copy link
Contributor

bencorrado commented Nov 21, 2024

Maybe even with a simple override to run pre and post for the mounts. So we don't need to reimplement the whole thing

That was exactly what I was thinking. We need to modify the path for systemd-bless-boot anyway since we don't use /boot

Maybe changing systemd-bless-boot.service with an override file to have something like:

[Service]
# Remount /efi as read-write before starting the main service
ExecStartPre=/usr/bin/mount -o remount,rw /efi

# Modify ExecStart to include --path=/efi
ExecStart=/usr/bin/systemd-bless-boot good --path=/efi

# Remount /efi as read-only after the service completes
ExecStartPost=/usr/bin/mount -o remount,ro /efi

@Itxaka
Copy link
Member

Itxaka commented Nov 21, 2024

Maybe even with a simple override to run pre and post for the mounts. So we don't need to reimplement the whole thing

That was exactly what I was thinking. We need to modify the path for systemd-bless-boot anyway since we don't use /boot

Maybe changing systemd-bless-boot.service with an override file to have something like:

[Service]
# Remount /efi as read-write before starting the main service
ExecStartPre=/usr/bin/mount -o remount,rw /efi

# Modify ExecStart to include --path=/efi
ExecStart=/usr/bin/systemd-bless-boot good --path=/efi

# Remount /efi as read-only after the service completes
ExecStartPost=/usr/bin/mount -o remount,ro /efi

I actually tested this with overrides for mounting unmounting the partition and it worked as expected. I think it gets the path automatically either from identifying the partition type or from the systemd-boot efivars but it do actually works as expected

@Itxaka
Copy link
Member

Itxaka commented Nov 22, 2024

With this overrider the boot-bless service works

### /etc/systemd/system/systemd-bless-boot.service.d/override.conf

[Service]

ExecStartPre=mount -o remount,rw /efi
ExecStartPost=mount -o remount,ro /efi

Notice that we also need to override another service, the boot-random-seed as that its automatically brought and needs write access to efi

### /etc/systemd/system/systemd-boot-random-seed.service.d/override.conf

[Service]

ExecStartPre=mount -o remount,rw /efi
ExecStartPost=mount -o remount,ro /efi

@Itxaka
Copy link
Member

Itxaka commented Nov 22, 2024

there is still an issue but we can workaround it with this

[Service]

ExecStartPre=mount -o remount,rw /efi
ExecStartPost=sed -i -E 's/(default\s+)*\+[0-9]+(-[0-9]+)?(\.conf)/\1\3/' /efi/loader/loader.conf
ExecStartPost=mount -o remount,ro /efi

So on our loader.conf we set the specific config that we want to run, so for example active.conf. With boot assessment this is automatically set to something like active+3.conf

The main problem is, that when bless-boot marks a config as good after booting, it renames it to remove the boot assessment, as its marked as good, so active+3.conf turns into active.conf. But the loader.conf is not updated, so its still pointing to active+3.conf which doesnt match the actual config. There is glob support in the default stanza, but that its not good enough in the case we have extra efis with different cmdlines as we want to match the name or the name+boot assessment not a greedy match which could lead to picking activeBad.conf

So to fix that, we can use the service itself to remove any mentions of the boot assessment part in the loader.conf with sed :D

I tested this with an active+3.conf which turns into active+2-1.conf on the first boot due how assesment works, then bless-boot triggered and marked it as good, changing the conf to active.conf. Then sed removed the +3 part from the loader.conf entry correctly.

I think we can work with this. I will test it further but seems to work as expected.

Moving pieces needed to fully implement this:

@Itxaka Itxaka moved this from In Progress 🏃 to Under review 🔍 in 🧙Issue tracking board Nov 26, 2024
@Itxaka
Copy link
Member

Itxaka commented Nov 26, 2024

mostly done, only agent PR missing merge and then we can test it once its on the framework and such but locally testing it seems to work as expected

@Itxaka
Copy link
Member

Itxaka commented Nov 27, 2024

All merged. created follow ups:

#3041
#3040

@Itxaka Itxaka closed this as completed Nov 27, 2024
@github-project-automation github-project-automation bot moved this from Under review 🔍 to Done ✅ in 🧙Issue tracking board Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants