-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not compatible with Clevis #614
Comments
latchset/clevis#437 suggests that this is not true see also: latchset/clevis#435 |
ZFSBootMenu is fundamentally incompatible with systemd in its initramfs image. We rely on dracut or mkinitcpio to build images and provide some very basic initialization, but then expect that ZFSBootMenu will assume total control of the system. |
Indeed, I missed this in my research. Thanks for the links! But in my case I am getting the following errors (using Debian container from #612:
It might be valuable thought to go Fedora route, as mentioned in latchset/clevis#435 Full log: https://gist.github.com/BohdanTkachenko/97490b50c4b44f5d7c41a143950f501e
Is there a reason why total control is so fundamental here? It seems like systemd does a decent job in controlling the system and with proper integration that could work well together? |
dracut was decoupled from systemd in clevis v20, debian is still on v19 at best |
ZBM is built around a simple event loop and assumes throughout that it is the only actor in the system. Trying to eliminate this assumption to admit the possibility of running other asynchronous initramfs hooks or other a general purpose supervisor would substantially complicate the design of ZFSBootMenu for no benefit. Furthermore, many initramfs hooks to extend functionality are poorly written and rely on overly simplistic assumptions about the state of the system; ZFSBootMenu plays a critical, singular purpose in the boot process, and attempting to reconcile this with the broad ecosystem of general initramfs modules to do... whatever they like... will strain our testing procedure and likely result in reduced reliability of the bootloader. |
Got it! Thanks for pointers and the detailed explanation! So it looks like it's worth to explore 2 options:
I will update this thread once I verify these options. |
I finally was able to make it work. Got stuck into latchset/clevis#456 Other than that, indeed Clevis v20 solves the issue. Also regarding my other issue that required me to unlock LUKS twice - ZBM unlocks zroot, so I just copied my keys into /etc/zfs/keys on zroot. Seems to work, although I need to remember to keep those in sync. Thanks for help! |
Glad that you have everything working! |
ZFSBootMenu build source
Container build, dracut
ZFSBootMenu version
2.3.0
Boot environment distribution
Debian
Problem description
TL;DR It is impossible to use ZFSBootMenu with Clevis as it depends on systemd which is explicitly disabled by ZFSBootMenu
I have a very specific setup on my home server and I was successfully using Grub + initramfs for it. However, since I am using ZFS and I had a good experience using ZFSBootMenu on my laptop, I decided to upgrade my setup. Since my home server can reboot because of power loss, I needed some way to unlock it in addition to SSH (to minimize downtime) and I found that I can do that with TPM2 which is provided by Clevis. So my previous setup was the following:
The flow was approximately the following: Grub loads initramfs from bpool which has Dropbear and Clevis, once keystore is unlocked all pools are mounted using keys stored on encrypted LUKS. It was done automatically thanks to TPM2, but just in case I could also SSH to enter password manually.
So I tried to bring this workflow to ZFSBootMenu. While setting up LUKS and SSH were relatively easy thanks to a good documentation and examples, I hit my first roadblock with Clevis.
First of all, there is no Clevis for Void Linux. So my initial instinct was just to compile it manually. But it appears that there are a lot of dependencies that are not packaged for Void Linux as well. So I decided to try build it from Debian. See #612 which adds Dockerfile for Debian.
Overall everything was working in Debian the same as in Void. That was until I tried to enable Clevis. It appears that it depends on systemd. But systemd is specifically disabled in c96165c referencing to #81 as the reason. And indeed, that was another roadblock that I hit.
I tried to understand the root cause of the issue and it appeared that Dracut when using systemd will try to boot the system automatically, but there is no system yet at that point! We still need ZFSBootMenu to unlock the keystore and import our pool, but systemd does not know this and it thinks that everything is ready.
Initially I tried to mask
initrd-switch-root.service
which seemed to be causing the trouble with a script inrc.d
:Although it seems ugly to interfere with systemd this way, but it worked.
Then I uncovered the next problem: for some reason
$control_term
was not defined inzfsbootmenu-parse-preinit.sh
. Because of that it was trying to execute a commandexec setsid bash -c "exec /libexec/zfsbootmenu-init < > 2>&1"
which bash didn't like. IIUC it is supposed to be defined byzfsbootmenu-parse-commandline.sh
and it looks like the script is still called, but$control_term
is never exported in that script? Not sure what I am missing here. But I was able to quickly workaround it by exporting$control_term
explicitly. That allowed me to proceed further.Now I hit another issue: somehow Dracut still tries to stop my system early, while I am still in ZFSBootMenu! At this point I don't remember exactly how I figured this, but I came across the line
exec setsid bash -c "exec /libexec/zfsbootmenu-init <${control_term} >${control_term} 2>&1"
and realized that it makes the parent script not to wait until it's completion. And since Dracut executes hooks sequentially and waits for each of them to finish, at this point it thinks that all work is done here (it is not) and it is safe to proceed (it is not) and proceeds with following steps, which in systemd world also mean system cleanup, thus it does some stuff that is messing with ZFSBootMenu and eventually calls early mentionedinitrd-switch-root.service
. So as a workaround I just removedexec
in front of that line of code and it magically fixed the issue! I was also able to remove service masking I added before, as now systemd patiently waits for ZBM to finish before cleaning up everything.I created #613 just in case if we can discuss it more and find a way to integrate those fixes, so they don't break anything else.
So at this point everything works for me. But there is still one issue though. Dracut seems to be cleaning up all device-mapper devices in https://github.com/dracutdevs/dracut/blob/master/modules.d/90dm/dm-shutdown.sh so at the time my rootfs initramfs loads, the keystore is locked again. I worked my way around it by just adding another unlock in my system as well, so it kind of works, but asks the password twice. I added Dropbear there as well, but it's just looks ugly to SSH twice to unlock my server. It is not a too big issue though since I have Clevis now and in normal cases it will never ask for password. So this route is mostly for some maintenance or recovery work.
Another issue I'm seeing right now is
Timed out waiting for device dev-gpt\x2dauto\x2droot.device - /dev/gpt-auto-root.
when not entering the password for too long. I need to look into that, I think it should be relatively easy to fix.I would appreciate any advice on steps I've already taken if there was a better way to do things or if anyone has ideas about the issues that I still have.
Steps to reproduce
n/a
The text was updated successfully, but these errors were encountered: