Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent GRUB Configuration on .raw images #2980

Open
dnugmanov opened this issue Nov 4, 2024 · 13 comments · Fixed by kairos-io/kairos-agent#600
Open

Inconsistent GRUB Configuration on .raw images #2980

dnugmanov opened this issue Nov 4, 2024 · 13 comments · Fixed by kairos-io/kairos-agent#600
Assignees
Labels
area/agent bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call

Comments

@dnugmanov
Copy link

dnugmanov commented Nov 4, 2024

Kairos version: v3.2.1

Describe the bug
I encountered an issue with TPM OOM, which was fixed much earlier than my Kairos release. When i was debugging I discovered some unexpected behavior for me with raw images. The root cause was using the latest released AuroraBoot version (v0.2.7) instead last :shortsha tag , which propagates outdated GRUB files.

Observations

  1. GRUB Configuration Source for RAW images:

    • GRUB is always loaded from the COS_RECOVERY partition, not COS_STATE. This seems unusual, as it introduces a dependency on the recovery partition for core boot configurations.
    • It always points to COS_RECOVERY.
    cat /tmp/grub_mount/EFI/BOOT/grub.cfg
    
    search --no-floppy --label --set=root COS_RECOVERY
    set root=($root)
    set prefix=($root)/grub2
  2. Tooling Version Impact:

    • The final image state is affected by the version of the tooling. There’s a lack of alignment between Kairos and AuroraBoot versions:
      • The latest AuroraBoot release was v0.2.7 (2023), which points to v0.9.0 of os-builder.
      • os-builder points to :latest packages —its last release was on Oct 4, 2023.
      • The necessary TPM fix was committed on Jan 9, yet the fix isn’t consistently applied.

To Reproduce
Build any raw image with the last released AuroraBoot version (not from a commit tag, but the tagged release).

Expected Behavior
There should be no dependency on tooling versions when building raw images, or tooling versions should be explicitly set in a dependency matrix.

Topics for Discussion:

  • Are there plans to create a release cycle for AuroraBoot or to change the configuration logic from static to dynamic (e.g., by parsing release files inside Kairos and checking out the corresponding package versions) rather than baking files inside tooling containers?
  • Is pointing from the EFI partition to the COS_RECOVERY partition correct, or should we regenerate the configuration to point to COS_STATE during reset/upgrade?
@dnugmanov dnugmanov added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Nov 4, 2024
@jimmykarily jimmykarily moved this to In Progress 🏃 in 🧙Issue tracking board Nov 4, 2024
@Itxaka
Copy link
Member

Itxaka commented Nov 8, 2024

Is pointing from the EFI partition to the COS_RECOVERY partition correct, or should we regenerate the configuration to point to COS_STATE during reset/upgrade?

that its correct. The idea there was that we ship ont eh raw images only the recovery partition and on boot it would run the reset and recreate the system and state partition from scratch. Im not even sure thats now running as expected so we should revisit that

Are there plans to create a release cycle for AuroraBoot or to change the configuration logic from static to dynamic (e.g., by parsing release files inside Kairos and checking out the corresponding package versions) rather than baking files inside tooling containers?

we are currently on a rework of the full auroraboot, including merging thing together like enki, expanding the netboot lib to support more netboot scenarios and ship more artifacts bundled with it, so indeed the pace on versioning release should get better and indeed, the release cadence should pick up very soon (we are currently merging enki directly into aurora) and less reliant on static artifacts

as you can see, we already had issues with this grub stuff in aurora and was reported to move to a more dynamic thing more than 6 months ago #2573 :)

@Itxaka Itxaka self-assigned this Nov 19, 2024
@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

ok, checked out and indeed this is ok, the grub config has to point to chainload the recovery one.

This is because on a raw image we only ship the oem and recovery partition and on the first boot, the reset should hit and properly setup the instace/vm/node to a proper image.

It actually should override the grub values to point to the state partition once reset, so maybe we lost that part? Let me run this image in a VM and reset to check that part

@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

cant even reset #3025

@dnugmanov
Copy link
Author

This is because on a raw image we only ship the oem and recovery partition and on the first boot, the reset should hit and properly setup the instace/vm/node to a proper image.

Yes, the issue is likely somewhere here. Even after resetting, GRUB on EFI is still pointing to COS_RECOVERY.

cant even reset #3025

Added comment to issue

@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

another ticket from this: #3026

@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

indeed after resetting, the file still points to recovery, which is bad

root@cos-recovery:~# cat /run/cos/efi/EFI/BOOT/grub.cfg 
search --no-floppy --label --set=root COS_RECOVERY
set root=($root)
set prefix=($root)/grub2
configfile ($root)/etc/cos/grub.cfg

@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

and another ticket #3027

@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

Found it! Looks like we forgot to mount the EFI partition during reset, so it would copy the proper files but not into the actual partition itself, just ot the local ephemeral dir so nothing would change.

@Itxaka Itxaka moved this from In Progress 🏃 to Under review 🔍 in 🧙Issue tracking board Nov 19, 2024
@dnugmanov
Copy link
Author

Does the bug affect the upgrade process? Our current workflow is as follows: raw (recovery) > reset > upgrade. However, even after completing this workflow, EFI is still pointing to COS_RECOVERY.

@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

Does the bug affect the upgrade process? Our current workflow is as follows: raw (recovery) > reset > upgrade. However, even after completing this workflow, EFI is still pointing to COS_RECOVERY.

IIRC upgrade does not touch the grub part, it only upgrades the active/passive/recovery so it wont touch that file at all.

@dnugmanov
Copy link
Author

Thanks, @Itxaka. I will test it as soon as the new Kairos 3.2.x with bumped kairos-agent is released.

@Itxaka Itxaka reopened this Nov 19, 2024
@github-project-automation github-project-automation bot moved this from Done ✅ to Under review 🔍 in 🧙Issue tracking board Nov 19, 2024
@Itxaka
Copy link
Member

Itxaka commented Nov 19, 2024

Reopening until it lands on Kairos and we check the bios part as well

@Itxaka
Copy link
Member

Itxaka commented Nov 20, 2024

bios does not need to be mounted, reset already reinstalls grub to the proper partition directly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call
Projects
Status: Under review 🔍
Development

Successfully merging a pull request may close this issue.

2 participants