Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocky9 image fails deployment #100

Open
wespiard opened this issue Mar 23, 2023 · 23 comments
Open

rocky9 image fails deployment #100

wespiard opened this issue Mar 23, 2023 · 23 comments
Labels
Rocky triaged Triaged to be addressed in a given cycle

Comments

@wespiard
Copy link
Contributor

wespiard commented Mar 23, 2023

First off, I am not sure if this is a MaaS issue, an issue with the packer template, or a tool version issue. I am new to packer, but I've gotten RHEL 8 and Rocky 8 images to be deployed successfully.

I am running packer on the same machine that I'm hosting MaaS with. MaaS is installed via snap, and is version 3.2.7. It is running Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-144-generic x86_64). The CPU is a Xeon E3-1220 v6, which from what I have found supports x86-64-v2 extensions. Packer is version 1.8.6.

The error in the MaaS GUI's log that is reported is this:

Marking node failed - Missing boot image custom/amd64/ga-20.04/rocky9

After running make in the rocky9 directory, I uploaded the rocky9.tar.gz file using the command in the readme, replacing $PROFILE with my MaaS username:

maas $PROFILE boot-resources create \
    name='custom/rocky9' title='Rocky 9 Custom' \
    architecture='amd64/generic' base_image='rhel/9' filetype='tgz' \
    content@=rocky9.tar.gz

Then, after deploying a new machine in MaaS, I get the error stated above. The installation output log ends with the following:

  finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: FAIL: installing grub to target devices
        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/configuring-bootloader: FAIL: configuring target system bootloader
        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: FAIL: curtin command curthooks
        Traceback (most recent call last):
          File "/curtin/curtin/commands/main.py", line 202, in main
            ret = args.func(args)
          File "/curtin/curtin/commands/curthooks.py", line 1886, in curthooks
            builtin_curthooks(cfg, target, state)
          File "/curtin/curtin/commands/curthooks.py", line 1851, in builtin_curthooks
            setup_grub(cfg, target, osfamily=osfamily,
          File "/curtin/curtin/commands/curthooks.py", line 804, in setup_grub
            install_grub(instdevs, target, uefi=uefi_bootable, grubcfg=grubcfg)
          File "/curtin/curtin/commands/install_grub.py", line 381, in install_grub
            grub_name, grub_target = get_grub_package_name(target_arch, uefi, rhel_ver)
          File "/curtin/curtin/commands/install_grub.py", line 80, in get_grub_package_name
            raise ValueError('Unsupported RHEL version: %s', rhel_ver)
        ValueError: ('Unsupported RHEL version: %s', '9')
        ('Unsupported RHEL version: %s', '9')

Does the version of curtin matter on the build system, or the actual machine the image is being deployed on?

For example, curtin is version 20.1 on my MaaS Ubuntu machine that I'm trying to build the rocky9 image on. But on the machine it's being deployed on, curtin is version 22.1, as shown at the beginning of the installation log: curtin: Installation started. (22.1-0ubuntu1~20.04.1)

@troyanov
Copy link
Member

troyanov commented Mar 23, 2023

Hi @wespiard

It seems that you are using legacy boot mode and not UEFI. Can you try with UEFI or there is a reason why UEFI cannot be used?

According to canonical/curtin@5b89082 support for RHEL9 was added but not released yet.

If you need legacy boot mode, I would suggest to patch curtin yourself.
Since you are using snap you will have to get latest snap, unpack, patch curtin, repack and install:

snap download maas
unsquashfs maas_xxx.snap
snap pack ./squashfs-root

sudo snap install --dangerous maas_xxx.snap

And don't forget to connect snap slots and plugs after that (since for --dangerous mode that doesn't happen automatically):

snap connections maas | awk '$1 != "content" && $3 == "-" {print $2}' | xargs -r -n1 sudo snap connect
sudo snap restart maas

@troyanov troyanov added the Rocky label Mar 23, 2023
@wespiard
Copy link
Contributor Author

@troyanov thank you for the reply.

I'm not sure exactly what you mean. Are you saying the machine I'm trying to deploy is configured to PXE boot in legacy boot mode (via the BIOS)? Or is there a boot configuration in MaaS settings?

I don't think I consciously made a decision to use legacy boot mode anywhere, so it may not be necessary.

Thanks!

@troyanov
Copy link
Member

via the BIOS

Correct, that should be configurable in BIOS settings.

Example:
image

@wespiard
Copy link
Contributor Author

Upon changing this setting in my BIOS (Dell PowerEdge R730), the MaaS event log just gets stuck in a loop after repeatedly failing to PXE boot, with this message: TFTP Request - pxelinux.0

With a KVM connected to see what is happening, I get the following output:

Booting from PXE Device 1: Integrated NIC 1 Port 1 Partition 1
    Downloading NBP file...
        Succeed to download NBP file.
Boot Failed: PXE Device 1: Integrated NIC 1 Port 1 Partition 1

Then it tries to boot from the next boot options (HDD, Optical drive, etc.) and fails. Then every 30 seconds or so it retries and fails, etc.

@wespiard
Copy link
Contributor Author

wespiard commented Mar 23, 2023

I looked through the MaaS dhcpd.conf file for some context, and there's a if-else block towards the bottom that I interpret as pointing to a PXE boot file? Note that MaaS is not actually acting as a DHCP server for the main NICs that are being PXE booted, I just had MaaS' dhcpd.conf handy to reference and learn.

Should it be requesting bootx64.efi or something like that, or is pxelinux.0 correct?

@troyanov
Copy link
Member

troyanov commented Mar 24, 2023

@wespiard it seems that you have to stay with "legacy boot" option then.

If patching curtin is not an option for you, you can try to set the base image to rhel/8
According to canonical/curtin@5b89082 it might work

@troyanov
Copy link
Member

Note that MaaS is not actually acting as a DHCP server for the main NICs that are being PXE booted

So you are using external DHCP? Can you tell a bit more about your setup, do you have MAAS DHCP as a next-server for netboot? From your original message I thought you got your machine booting under MAAS direction and all DHCP was handled by MAAS.

Should it be requesting bootx64.efi or something like that, or is pxelinux.0 correct?

IIRC pxelinux.0 will be requested for legacy BIOS, bootx64 is an EFI bootloader.

@wespiard
Copy link
Contributor Author

@wespiard it seems that you have to stay with "legacy boot" option then.

If patching curtin is not an option for you, you can try to set the base image to rhel/8

According to canonical/curtin@5b89082 it might work

This sounds like the easiest option for now. I'll give it a shot.

@wespiard
Copy link
Contributor Author

So you are using external DHCP? Can you tell a bit more about your setup, do you have MAAS DHCP as a next-server for netboot? From your original message I thought you got your machine booting under MAAS direction and all DHCP was handled by MAAS.

Yeah, sorry. External DHCP server that I don't really have control over. The subnet our MaaS machines are on is part of a large corporate network.

So I'm assuming that DHCP server is configured to point to MaaS for PXE booting?

IIRC pxelinux.0 will be requested for legacy BIOS, bootx64 is an EFI bootloader.

So because it looks like 'pxelinux.0' is being requested even when configured for EUFI boot, is the external DHCP server not configured properly to handle it?

@troyanov
Copy link
Member

So I'm assuming that DHCP server is configured to point to MaaS for PXE booting?

It should, yes.

I've gotten RHEL 8 and Rocky 8 images to be deployed successfully

Since you've mentioned this in a very first message, I would assume that DHCP was configured correctly.

@troyanov
Copy link
Member

So because it looks like 'pxelinux.0' is being requested even when configured for EUFI boot, is the external DHCP server not configured properly to handle it?

I am wondering if this is sort of a fallback scenario? Like it tried to do UEFI boot and then did a fall back to legacy BIOS.

@wespiard
Copy link
Contributor Author

I tried re-uploading the image to MaaS with base_image='rhel/8', but got the same error.

maas $PROFILE boot-resources create \
    name='custom/rocky9' title='Rocky 9 Custom' \
    architecture='amd64/generic' base_image='rhel/8' filetype='tgz' \
    content@=rocky9.tar.gz

Looking through the curtin functions more, it looks like target is read here: rhel_ver = (distro.rpm_get_dist_id(target)

I didn't want to follow the code too much, but I'm asssuming it's reading the version from the image itself, not just the parameters we supply to MaaS? Seems like the proper way to do it, I suppose.

Also, I noticed this commit this morning: 65e270b

Does MaaS actually need to be 3.3 for Rocky 9? My version is 3.2.7.

I am okay using Rocky 8 for the time being if there isn't an easy solution for Rocky 9. Don't want to use up too much of your time as there aren't any specific reasons I need 9 over 8 as of now.

@SK1Y101 SK1Y101 added the triaged Triaged to be addressed in a given cycle label Aug 17, 2023
@SK1Y101
Copy link
Member

SK1Y101 commented Aug 17, 2023

Hey @wespiard, is this still an issue you're encountering?

@wespiard
Copy link
Contributor Author

Hey @wespiard, is this still an issue you're encountering?

I just stuck with Rocky/RHEL 8. I don't have any test machines to play with right now, so I can't promise when I'll try rocky9 next.

@SK1Y101
Copy link
Member

SK1Y101 commented Aug 17, 2023

Alright, sounds good.
I've tried deploying rocky9 with MAAS 3.2 and encountered a few errors. While I won't say for definite it's not possible (I'm not 100% sure), it may have been a different issue with my setup, that was indicative of it not working.

@CodeBleu
Copy link

CodeBleu commented Sep 5, 2023

I'm on MaaS 3.3.4 and I get the same error when trying to deploy Rocky 9.

finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: FAIL: installing grub to target devices
finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/configuring-bootloader: FAIL: configuring target system bootloader
finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: FAIL: curtin command curthooks
Traceback (most recent call last):
File "/curtin/curtin/commands/main.py", line 202, in main
ret = args.func(args)
File "/curtin/curtin/commands/curthooks.py", line 1886, in curthooks
builtin_curthooks(cfg, target, state)
File "/curtin/curtin/commands/curthooks.py", line 1851, in builtin_curthooks
setup_grub(cfg, target, osfamily=osfamily,
File "/curtin/curtin/commands/curthooks.py", line 804, in setup_grub
install_grub(instdevs, target, uefi=uefi_bootable, grubcfg=grubcfg)
File "/curtin/curtin/commands/install_grub.py", line 381, in install_grub
grub_name, grub_target = get_grub_package_name(target_arch, uefi, rhel_ver)
File "/curtin/curtin/commands/install_grub.py", line 80, in get_grub_package_name
raise ValueError('Unsupported RHEL version: %s', rhel_ver)
ValueError: ('Unsupported RHEL version: %s', '9')
('Unsupported RHEL version: %s', '9')
curtin: Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'curthooks']
Exit code: 3
Reason: -
Stdout: start: cmd-install/stage-curthooks/builtin/cmd-curthooks: curtin command curthooks

@zoltan
Copy link

zoltan commented Nov 9, 2023

this is because the maas snap is built on core22, which has a too old curtin, unfortunately (v21.3). the referenced commit above only made it into 23.1.1 the earliest.

@lwandrebeck
Copy link
Contributor

I can confirm 8/9 install used to fail with 3.3/edge. Works fine since I upgraded to latest/edge (3.5 alpha1 which comes with curtin 23.1.1).

@Shadowphax
Copy link

I can confirm 8/9 install used to fail with 3.3/edge. Works fine since I upgraded to latest/edge (3.5 alpha1 which comes with curtin 23.1.1).

Where did you obtain 3.5 from? I need to deploy Rocky 9x on a few systems and getting the same issue.

@lwandrebeck
Copy link
Contributor

Via snap latest/edge. Or you can build your own package by cloning master branch of git repo if you will.

@pdion891
Copy link

We used this project to create a Rocky 9.1 template and the template is working, on UEFI and Legacy. Look like the issue is related to recent version of Rocky 9, current latest is 9.4.

with Rocky 9.4;

  • UEFI: boot instruction missing from the disk failing the boot
  • LEGACY: boot of rocky stalled at Probing EDD (edd=off to disable)... ok

tested on Maas 3.4.0 and 3.2.9

note: in order to build 9.1 without issue with packer, it's been installed using cdrom as source in the kickstart from the full iso of 9.1 from: https://dl.rockylinux.org/vault/rocky/9.1/isos/x86_64/

@Shadowphax
Copy link

Shadowphax commented Jan 17, 2024

@pdion891 - I've documented the issue here with Rocky - https://discourse.maas.io/t/rocky-9-3-deployment/7744. This should resolve the boot issues for both legacy and UEFI.

@SK1Y101
Copy link
Member

SK1Y101 commented Jan 17, 2024

the bug report for the linked post is #191 also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rocky triaged Triaged to be addressed in a given cycle
Projects
None yet
Development

No branches or pull requests

8 participants