Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Checking and fixing system config for NBDE #17796

Merged
merged 1 commit into from
Jan 13, 2023

Conversation

mvollmer
Copy link
Member

@mvollmer mvollmer commented Oct 6, 2022

Outdated demo: https://www.youtube.com/watch?v=mbv-syeLHsc


Release note:

Storage: Set up a system to use NBDE

It is possible (and unfortunately likely) that a system is misconfigured and can not mount filesystems during boot that rely on Clevis to unlock them. Cockpit can now fix this when adding a keyserver to a encrypted filesystem.

image

@mvollmer mvollmer temporarily deployed to cockpit-dist October 6, 2022 12:20 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 7, 2022 11:37 Inactive
@mvollmer mvollmer added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Oct 11, 2022
@mvollmer mvollmer temporarily deployed to cockpit-dist October 11, 2022 10:09 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 11, 2022 12:24 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 12, 2022 14:25 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 13, 2022 15:38 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 14, 2022 09:47 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 17, 2022 07:57 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 17, 2022 08:20 Inactive
@mvollmer mvollmer force-pushed the storage-nbde-help branch 2 times, most recently from ee93d26 to 038db1b Compare October 17, 2022 10:07
@mvollmer mvollmer temporarily deployed to cockpit-dist October 17, 2022 10:12 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 17, 2022 10:22 Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist October 17, 2022 12:01 Inactive
@garrett
Copy link
Member

garrett commented Oct 17, 2022

Awesome work!

I have a bunch of design-related feedback:

  1. The location is not correct. (You know this, and you said it.)

    • This is a system-level option, right? Yet you have it under a partition on the encryption tab. We need to promote it higher up in the hierarchy. This means it should primarily be on the top-level storage page somewhere, somehow.
    • I'm assuming that you normally would only run the check (and fix, if needed) once per system. Is there a way to save (on a system level, not in-browser) the check to know if it's been done yet or not? We could, for example, also show a message on an unchecked system but not on a passing system. (Near where it currently is, as NBDE needs support for it.)
    • Perhaps we should make something like a two-step dialog, similar to a wizard, where the first page is checking for NBDE and fixing it if we aren't sure the system supports it. The second "page" of the modal would then be using NBDE within Cockpit. If we know the system supports NBDE correctly, then we'd skip directly to the second page of the modal and only show that. (It wouldn't look like a wizard.) We do this elsewhere in Cockpit (but I don't remember where) — or at least I know I've made mockups for this concept before (perhaps this isn't implemented yet?). In this way, we wouldn't need to show any messaging about NBDE unless the user is actively going to use it.
  2. Actions on modals should always be a button styled button (primary, secondary, warn, danger, etc.) and never should be a link styled button, except for "Cancel" and the cancel action must always be paired with an action button. "Close" is not "Cancel". Close always needs to be a secondary button style.

  3. We probably don't need to show a passing checklist. We could show the current step and problems. And we would show a success if everything passes. We don't need to show details of things that are OK. (If we're only showing the errors and them being fixed, then it's OK to show them in an OK state after fixing them, I guess.)

  4. "Fix" should probably elaborate what it is fixing, I think?

  5. If it's running a medium-length process that depends on the modal being there, then you shouldn't be able to close. And you probably shouldn't be able to cancel, unless there will be no ill effects. A longer process must always allow for the modal to be closed an have status information in the page, however. (Unless everything else on the page depends on the process, such as the update process on the Software Updates page, as an example.)

@mvollmer
Copy link
Member Author

mvollmer commented Oct 18, 2022

I have a bunch of design-related feedback:

Thanks! This is really helpful.

This is a system-level option, right?

Mostly, yes, and we should move it up in the hierarchy. However, there are two sets of checks-and-fixes: One for the root filesystem, and another one for everything else. Currently, the buttons in the filesystem tab start one of these two flows, depending on whether the filesystem in question is the root filesystem or not.

However, the checks and fixes for a non-root filesystem are much lighter than for the root filesystem. They might not justify the full blown dialog that is implemented here. In any case, NBDE for the root filesystem is the main thing we are addressing here, so let's ignore non-root filesystems for now.

Thus, yes, this is a system-level option and should be somewhere higher up.

I'm assuming that you normally would only run the check (and fix, if needed) once per system.

Yes. However, it is always possible that the system gets broken later on. Hmm. The best would be to find a very fast reliable way to figure out if anything needs to be fixed that can be done every time the Storage page loads. I will have a try at this. (We would need to check whether the root filesystem uses NBDE, and whether the initrd includes support for this.)

However, this is starting to move us outside of the scope of Cockpit, isn't it? Isn't this something that Insights should be doing? (Checking your system in the background for things that need fixing, alerting you about them, and offering ways to actually fix them.)

Cockpit could certainly show the alerts and let people carry out the fixes, but I'd say we shouldn't go and run the actual checks in the background.

So, what if, as a first step, we only do the check-and-fix flow when adding NBDE to the root filesystem, and forget about a global button? (If I find a quick way to check whether to show this button, we can reconsider this.)

Perhaps we should make something like a two-step dialog, similar to a wizard, where the first page is checking for NBDE and fixing it if we aren't sure the system supports it.

Adding a NBDE key is already a wizard: First you enter the contact information for the server, then you need to compare fingerprints and give the final OK. In my current implementation, the check-and-fix then runs as a third step. I agree that it is much better to run it first: If the fixing fails, you should probably not add the NBDE key.

But the first step in the existing wizard includes selecting whether to add a regular passphrase, or a NBDE key. So we would have step 1: Select NBDE, enter server address; step 2: check system for NBDE support; step 3: compare and confirm fingerprints. Is that okay?

As you propose, I think we can do the check when clicking "Apply" in step 1, and immediately skip to step 3 when nothing needs fixing. Step 2 would only happen when there is something to fix.

@garrett
Copy link
Member

garrett commented Oct 18, 2022

So, what if, as a first step, we only do the check-and-fix flow when adding NBDE to the root filesystem, and forget about a global button? (If I find a quick way to check whether to show this button, we can reconsider this.)

Sure.

But the first step in the existing wizard includes selecting whether to add a regular passphrase, or a NBDE key. So we would have step 1: Select NBDE, enter server address; step 2: check system for NBDE support; step 3: compare and confirm fingerprints. Is that okay?

Wizards are for interactive input. Checking the system is not interactive; it's something that does not require input from the user. Therefore, it should not be a step in a wizard.

And this doesn't sound like it should be a wizard. A wizard is a particular UI pattern. You're talking about a process.

But as the verification step does require user input (accepting the key or not), it could be a wizard, I suppose. That's better than two sequential dialogs. But it's not as good as a dialog that changes based on context. It could be implemented as a wizard, but we don't need (nor want) a sidebar, a big header, or next/back buttons that PF wizards provide.

Here's a mockup of how it could work. It definitely needs updates and iteration.

nbde-add excalidraw

(I made this in ExcaliDraw, so you could just open this PNG there if you want to edit it and it would show you the vector form, as it's embedded.)

@garrett
Copy link
Member

garrett commented Oct 18, 2022

Also: Since we'll check for a number of things for NBDE support and it takes a little while, the checking step could have a progress bar, as we can just divide it by # of steps and when each completes, advance the bar.

@garrett
Copy link
Member

garrett commented Oct 18, 2022

Here's a revised flow based on IRC feedback:

nbde-add-flow excalidraw

@jelly
Copy link
Member

jelly commented Nov 25, 2022

image

Minor nitpick, maybe we should have some more spacing after the SHA1 hash.

jelly
jelly previously approved these changes Nov 25, 2022
Copy link
Member

@jelly jelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minor nitpicks/questions.

subprocess.run("virsh -c qemu:///session screenshot %s '%s'" % (str(machine._domain.ID()), name),
shell=True)
attach(name, move=True)
print("Wrote screenshot to " + name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! Would be nice if this was more generic and maybe by default in wait_reboot. But then we'd have to add a wrapper in testlib.py so self.wait_reboot as that can call attach.

b.wait_not_present(panel + 'ul li:contains("Slot 1")')

@skipImage("TODO: don't know how to encrypt the rootfs",
"debian-stable", "debian-testing", "ubuntu-stable", "ubuntu-2204", "arch")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arch is waiting on latchset/clevis#374 I'll try to poke the PR

progress(_("Checking for NBDE support in the initrd"), () => task.close());
return task.then(data => {
progress(null, null);
if (data.indexOf("clevis") < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little brittle, as also lsinitrd -m also prints early CPIO image and stuff. But there doesn't seem to be a better way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I understand that. But it also prints other stuff. It was more a wish that more tools would have parseable output with for example --json.


function ensure_root_nbde_support(steps, progress) {
progress(_("Adding rd.neednet=1 to kernel command line"), null);
return cockpit.spawn(["grubby", "--update-kernel=ALL", "--args=rd.neednet=1"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this might be a silly question, but doesn't grubby also re-generate initramfs. So in theory we could first install clevis-dracut. But that's a real nitpick I feel :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grubby does not regenerate the initrd. At least it never did for me.

root_row = i
break

tab = self.content_tab_expand(root_row, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no "/" this will blow up, I guess that's intendend?

@mvollmer
Copy link
Member Author

If there is no "/" this will blow up, I guess that's intendend?

It's tolerated. :-) We could produce a better error message, but this really shouldn't happen at all.

@mvollmer mvollmer requested a review from garrett December 14, 2022 11:17
@mvollmer mvollmer temporarily deployed to cockpit-dist December 21, 2022 08:11 — with GitHub Actions Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist January 6, 2023 10:10 — with GitHub Actions Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist January 9, 2023 08:30 — with GitHub Actions Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist January 9, 2023 11:23 — with GitHub Actions Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist January 10, 2023 10:36 — with GitHub Actions Inactive
@mvollmer mvollmer temporarily deployed to cockpit-dist January 10, 2023 10:53 — with GitHub Actions Inactive
Comment on lines +273 to +275
if (p.waiting) {
text = _("Waiting for other software management operations to finish");
} else if (p.package) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 added lines are not executed by any test. Details

} else if (p.package) {
let fmt;
if (p.info == PkEnum.INFO_DOWNLOADING)
fmt = _("Downloading $0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test. Details

if (p.info == PkEnum.INFO_DOWNLOADING)
fmt = _("Downloading $0");
else if (p.info == PkEnum.INFO_REMOVING)
fmt = _("Removing $0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test. Details

title: cockpit.format(_("The $0 package must be installed."), package_name),
func: progress => {
if (data.remove_names.length > 0)
return Promise.reject(cockpit.format(_("Installing $0 would remove $1."), name, data.remove_names[0]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test. Details

if (data.remove_names.length > 0)
return Promise.reject(cockpit.format(_("Installing $0 would remove $1."), name, data.remove_names[0]));
else if (data.unavailable_names.length > 0)
return Promise.reject(cockpit.format(_("The $0 package is not available from any repository."), name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test. Details


function ensure_initrd_clevis_support(steps, progress, package_name) {
const task = cockpit.spawn(["lsinitrd", "-m"], { superuser: true, err: "message" });
progress(_("Checking for NBDE support in the initrd"), () => task.close());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test. Details

const fsys_options = fsys_config && parse_options(decode_filename(fsys_config[1].opts.v));

if (!fsys_options || fsys_options.indexOf(option) >= 0)
return Promise.resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test. Details

const crypto_config = array_find(block.Configuration, function (c) { return c[0] == "crypttab" });
const crypto_options = crypto_config && parse_options(decode_filename(crypto_config[1].options.v));
if (!crypto_options || crypto_options.indexOf(option) >= 0)
return Promise.resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test. Details

Comment on lines +409 to +411
} else
return Promise.resolve();
} else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 added lines are not executed by any test. Details

@mvollmer mvollmer requested a review from jelly January 12, 2023 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants