Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

console=ttyS0 is too slow and useless #48

Open
xnox opened this issue Jun 1, 2020 · 48 comments
Open

console=ttyS0 is too slow and useless #48

xnox opened this issue Jun 1, 2020 · 48 comments
Assignees

Comments

@xnox
Copy link
Contributor

xnox commented Jun 1, 2020

console=ttyS0 is specified in the gadget by default, in UC20, for all modes: recovery, install, and run mode.

However, on the hardware that does not have serial console (majority of real x86 hardware) this option significantly delays the boot, as the kernel is polling for the serial console to appear, delaying the boot by 90s.

Furthermore if the serial console is present, the baud rate is not set to be high enough, resulting in painfully slow boots still.

I would like to drop serial console option from the pc gadget.
If not completely, I can see the value of keeping it for the recover mode.
Alternatively I think we should publish a separate serial pc gadget, that specifies only the serial console with a high baud rate.

Could we make console a grubenv paramenter? such that ubuntu-image / snap-prepare-image can modify it, and it would persist from install mode, to sealed secrets, run/recover modes?

Also see https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1879290

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

Or for example only enable and run console-conf on it, and not make kernel/journald slowly push messages to serial console delaying the whole boot.

@anonymouse64
Copy link
Contributor

Note that changes to the kernel command line should probably not be done immediately until snapd has support to read the kernel command line from from the gadget.yaml / otherwise because right now the snapd snap has the kernel command line we seal the TPM to hard-coded so changing this will break FDE.

I believe @bboozzoo was working on the feature to read the kernel command-line from the gadget, do you have a status update on that?

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

@anonymouse64 i am aware of the current duplication / disconnect of the gadget vs sealing code, so yeah will not push out an update to this uncoordinated.

@ogra1
Copy link
Collaborator

ogra1 commented Jun 1, 2020

all x86 IoT gateways i have touched yet (as well as most servers) do default to using a serial console ... dropping it completely doesn't smell like a good plan ...

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

as well as most servers

The current reference target for the pc gadget is Intel NUC, which does not have serial console by default.

Ubuntu Core does not target servers.

Can you please elaborate on the "IoT gateways" => can they run the stock PC gadget, or have custom ones? I thought they all have custom gadgets and do not use the reference PC gadget.

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

Also clouds may or may not have serial console, but they should be forking their own gadget anyway.

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

@anonymouse64 @bboozzoo if it helps, we can turn console=* values into a variable in either the stock grubenv file or a custom grubenv file i.e. install-settings.conf grubenv file which has like overrides for the consoles= to use/seal, and like cloud-init datasources to use/seal.

@anonymouse64
Copy link
Contributor

anonymouse64 commented Jun 1, 2020

My 2¢ here is that we should probably:

  • always leave the serial console on for run mode and recover modes in the default gadget
  • always have the serial console go as fast as possible when enabled
  • make the kernel cmdline for using the serial console configurable when building the pc gadget in a very easy way that is encoded into the built snap (i.e. not something that gets dropped into the image to modify the behavior, it must be in the .snap file)

I agree with @ogra1, it might be the case that the Intel NUC doesn't have a serial, but for example another IoT amd64 edge gateway we enabled UC16 for was the Dell gateways which do have physical serial ports.

@ogra1
Copy link
Collaborator

ogra1 commented Jun 1, 2020

Can you please elaborate on the "IoT gateways" => can they run the stock PC gadget, or have custom ones? I thought they all have custom gadgets and do not use the reference PC gadget.

well, the dell gateways (which admittedly currently come with a custom gadget) would be an example, all advantech ones i have touched yet. but also 90% of other Industrial PCs that might "just install" x86 focused images we provide on cdimage.

after all the typical IoT or industrial PC is often headless, yet an x86 base often means you can use an uncustomized image on them, unlike with arm devices where you can not have a generic image easily due to HW specific bootloaders.

EDIT: i mentioned servers simply because IoT GWs are typically a cut down server, not a cut down desktop ...

@anonymouse64
Copy link
Contributor

x86 base often means you can use an uncustomized image on them

servers simply because IoT GWs are typically a cut down server

This is precisely why I think we should leave serial on by default in the pc gadget so that folks can "test-drive" UC on their IoT devices by just flashing a released default image and login with console-conf via serial without needing to build their own gadget snap/image.

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

My 2¢ here is that we should probably:

  • always leave the serial console on for run mode and recover modes in the default gadget
  • always have the serial console go as fast as possible when enabled
  • make the kernel cmdline for using the serial console configurable

That will cost us a lot of boot time out of the box. Even "as fast as possible" is very slow. 30s+ of additional boot time.

Note, this is about dropping "console=" from the kernel command line to stop forcing kernel to slow down it's boot to the speed of being able to push kmsg to the serial console.

This is not about stopping/preventing consoleconf to run on serial consoles. By default it is spawned on them all.

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

EDIT: i mentioned servers simply because IoT GWs are typically a cut down server, not a cut down desktop ...

It's an embedded platform. Neither desktop or server. Because for something to be called a server, I expect 1TB of RAM, 1PT of NVME storage, RAID, infiniband, etc.

@ogra1
Copy link
Collaborator

ogra1 commented Jun 1, 2020

while console-conf will indeed still come up, are there not menu bits at the initrd level now that would also use the defined console= ?

indeed, if it is just kernel boot messages we lose thats completely neglectable and i'd fully agree with the removal, but AFAIK there are potentially interactive bits before systemd kicks in as well

@anonymouse64
Copy link
Contributor

This is not about stopping/preventing consoleconf to run on serial consoles. By default it is spawned on them all.

So w/o console=ttyS0 in the kernel commandline for run mode, what would the user experience be like? They plug in their device look at a blank serial console for ... however many minutes and then magically at some point console-conf shows up?

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

This is not about stopping/preventing consoleconf to run on serial consoles. By default it is spawned on them all.

So w/o console=ttyS0 in the kernel commandline for run mode, what would the user experience be like? They plug in their device look at a blank serial console for ... however many minutes and then magically at some point console-conf shows up?

Good question. Need to double check experimentally, I can record some videos.

Somehow it still feels wrong to have both enabled by default on any hardware. It almost feels more appropriate to detect console in grub, and if it is serial pass serial console to the kernel, if it's video pass video to the kernel.

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

This is not about stopping/preventing consoleconf to run on serial consoles. By default it is spawned on them all.

So w/o console=ttyS0 in the kernel commandline for run mode, what would the user experience be like? They plug in their device look at a blank serial console for ... however many minutes and then magically at some point console-conf shows up?

We know that today, the experience is of 30s+ hang with no output from the kernel, when waiting for serial to show up that does not exist. Because we force the kernel to look for one, when there isn't one.

@anonymouse64
Copy link
Contributor

We know that today, the experience is of 30s+ hang with no output from the kernel

Arguably this is a regression from UC18 -> UC20 in that there is a 30s+ hang with no output from the kernel on non-serial TTYs because the kernel is stuck trying to write to a non-existent serial TTY.

I'd hate to introduce what appears to be a a hang on serial TTYs just because we don't want what appears to be a hang on non-serial TTYs.

It almost feels more appropriate to detect console in grub, and if it is serial pass serial console to the kernel, if it's video pass video to the kernel.

This would be great but I don't know how we can do that while still enabling automatic FDE by sealing the kernel command-line against the TPM, unless both snapd + grub somehow learn to check if there are serial TTY's on the system, etc. Maybe there's a simpler solution I'm not aware of.

@xnox
Copy link
Contributor Author

xnox commented Jun 1, 2020

We know that today, the experience is of 30s+ hang with no output from the kernel

Arguably this is a regression from UC18 -> UC20 in that there is a 30s+ hang with no output from the kernel on non-serial TTYs because the kernel is stuck trying to write to a non-existent serial TTY.

I'd hate to introduce what appears to be a a hang on serial TTYs just because we don't want what appears to be a hang on non-serial TTYs.

Not a regression, UC18 also hangs in the same way.

It almost feels more appropriate to detect console in grub, and if it is serial pass serial console to the kernel, if it's video pass video to the kernel.

This would be great but I don't know how we can do that while still enabling automatic FDE by sealing the kernel command-line against the TPM, unless both snapd + grub somehow learn to check if there are serial TTY's on the system, etc. Maybe there's a simpler solution I'm not aware of.

As per original London sprint design, snapd must seal against the install-time dynamic cmdline and persist that through modes/kernel updates and resealings. That was the requirement of the original design. Currently snapd doesn't do resealing as far as I can tell, but it must support that.

@jocado
Copy link

jocado commented Nov 10, 2020

Is there any movement or update on this ?

I'm particularly interested, as this causes an artificially long boot time on NUCs, and that eats into our Service Level budget on updates that require a reboot.

Also, now that snapd seems to be in control of the grub config, what is the recommended way to change the linux commandline ? Is it even possible from the gadget ?

Thanks.

@anonymouse64
Copy link
Contributor

As per original London sprint design, snapd must seal against the install-time dynamic cmdline and persist that through modes/kernel updates and resealings

I can't speak to the original London sprint design as I wasn't there and joined the project later, but the new plan is to have snapd dynamically generate the kernel command line that is to be used with sealing using the following things:

  • the recovery mode of the system (snapd_recovery_mode and friends)
  • some static parameters required for all systems (just panic=-1 to my knowledge)
  • other things that the gadget must specify in a way that is cryptographically asserted (i.e. contained inside the gadget snap and not what the system was booted with)

The last bit is what we are currently missing from snapd, which is a way for a gadget snap to specify additional kernel command line parameters. We have a rough plan and will implement it soon.

Also, now that snapd seems to be in control of the grub config, what is the recommended way to change the linux commandline ? Is it even possible from the gadget ?

Currently there is not a way to configure the command line without recompiling snapd. As mentioned, we will be working on a way to do this soon.

@jocado
Copy link

jocado commented Nov 10, 2020

Thanks for the info.

Sounds like, if the only static config is panic=-1, serial console by default is being removed ?

@anonymouse64
Copy link
Contributor

Ah yes sorry I forgot to explain that too, what will happen is that currently actually panic=-1 and console=... settings are considered part of the static snapd config, but when we have the mechanism for gadgets to support setting additional kernel command line parameters, we will move setting console=... from inside snapd to the gadget, so that likely this default gadget snap published by Canonical will still support the serial console, but a fork of the Canonical gadget snap could easily remove that if desired.

@jocado
Copy link

jocado commented Nov 10, 2020

Perfect - sounds good.

@xnox
Copy link
Contributor Author

xnox commented Nov 10, 2020

I have outstanding tasks to experiment with master serial console options, and/or speeding up the kernels serial console.

@jocado
Copy link

jocado commented Nov 30, 2020

Hi.

Just wondering,= seeing as snapd 2.48 was supposed to be the target release for UC20, and it's at candidate stage, if there was any way I can test changing grub config via snapd 2.48 yet ?

@jocado
Copy link

jocado commented Nov 30, 2020

Hi.

Just wondering, seeing as snapd 2.48 was supposed to be the target release for UC20, and it's at candidate stage, if there was any way I can test changing grub config via snapd 2.48 yet ?

@anonymouse64
Copy link
Contributor

@jocado the feature enabling gadget specified kernel command line options will not be in 2.48, it is still under very active development, but is getting much closer, for example see canonical/snapd#9724 and canonical/snapd#9719 which are getting us closer and closer to the final bits needed for this. It is unclear if we will backport those changes to 2.48 to be available in i.e. 2.48.1 or if the feature will just go into 2.49.

@jocado
Copy link

jocado commented Feb 15, 2021

Hi @anonymouse64

Just checking in here to see if we are able yet, or have a good idea of when, to be able to disable the serial console args in the kernel commandline via the gadget.

Is it supported in snapd 2.49 which is currently in the beta channel ?

Thanks!

@anonymouse64
Copy link
Contributor

@jocado unfortunately no, 2.49 does not have the full set of changes yet, we will keep you updated on when the feature is enabled. Thanks for your patience.

@jocado
Copy link

jocado commented May 12, 2021

It looks like we are very close now :)

https://forum.snapcraft.io/t/customising-uc20-kernel-command-line-arguments/24370

You will see comment from me there. I have tested and it's working for me with current edge revision.

@anonymouse64 Is there any rough release date for snapd-2.50 ?

@anonymouse64
Copy link
Contributor

@jocado snapd 2.50 is being released to stable as we speak, it is released in phases so not every device will get it at the same time, but by the current looks of it I think it should be 100% phased out within the next 24 hours

@jocado
Copy link

jocado commented May 12, 2021

@anonymouse64 which revision will it be though ? As the feature only seemed to be working for me in the current edge channel.

2.50+git1692.g1286560 2021-05-12 (11995)

Candidate 11841 was not working as expected.

@anonymouse64
Copy link
Contributor

Revision 11841 is being released, can you detail in the forum post how the candidate channel didn't work for you?

@jocado
Copy link

jocado commented May 12, 2021

It was simply that the cmdline.full was not respected, the default was still in use.

I did try and look around in logs etc, but I didn't see anything useful or obvious clues. I can add the contents of the cmdline.full , and any other details I can think of. I will do that tomorrow.

@bboozzoo
Copy link
Contributor

It'd be interesting to see debug level logs. You can add snapd.debug=1 to the command line to enable debug logging in snapd.

Perhaps it's also useful to take a look at the spread test we have: https://github.com/snapcore/snapd/blob/master/tests/nested/manual/core20-custom-kernel-commandline/task.yaml the test repacks pc gadget and goes through cmdline.extra/cmdline.full variants.

@bboozzoo
Copy link
Contributor

It was simply that the cmdline.full was not respected, the default was still in use.

I did try and look around in logs etc, but I didn't see anything useful or obvious clues. I can add the contents of the cmdline.full , and any other details I can think of. I will do that tomorrow.

BTW. have you installed the device from scratch maybe? snapd 2.50 carries an update to the boot script which supports cmdline.full, however, we decided to not bump the boot config version number, thus your current boot script will not get automatically updated.

@jocado
Copy link

jocado commented May 13, 2021

I did install it from scratch yes. That is one of our common use cases currently.

What should I expect in that situation though ? It doesn't work from system bootstrap, but the works at some point int he future , next time the gadget is updated perhaps ?

@bboozzoo
Copy link
Contributor

I did install it from scratch yes. That is one of our common use cases currently.

What should I expect in that situation though ? It doesn't work from system bootstrap, but the works at some point int he future , next time the gadget is updated perhaps ?

I'm looking into it right now. Looks like there's some mixup with what was cherry picked for 2.50. Some bits made it, but ones that glue everything together did not. I need to double check with @mvo5 but we may need to do 2.50.1.

In the meantime, can you try edge branch?

@jocado
Copy link

jocado commented May 13, 2021

It worked 100% for me with the edge revision yesterday.

@bboozzoo
Copy link
Contributor

It worked 100% for me with the edge revision yesterday.

That's good. When I have a branch for 2.50 ready, I'll add a link to it here. We build artifacts with the snapd snap as part of the workflow, you'll be able to grab it from there and verify.

@jocado
Copy link

jocado commented May 13, 2021

Great - thank you 👍

@bboozzoo
Copy link
Contributor

The branch is up canonical/snapd#10265 AFAIK we haven't decided yet whether this will be in 2.50.

@jocado
Copy link

jocado commented May 13, 2021

ok - thanks 🤞 - we are very keen for this feature 🙂

@bboozzoo
Copy link
Contributor

The tests have finished, and relevant ones were successful. When you click on the test workflow details, you should be able to access artifacts, which is a zip file with the snapd snap from that branch inside.

@jocado
Copy link

jocado commented May 14, 2021

Hi. Sorry for the delayed response. Just to confirm, The artifact above seemed to work for me.

@jocado
Copy link

jocado commented May 18, 2021

Just following on from last week, and the current revision that made it to current/stable, I presume we are looking at 2.50.1 now.

Not looking for absolutes, but is there any kind of rough ETA for that ? Are we talking weeks or months ?

@jocado
Copy link

jocado commented Jun 8, 2021

Hi.

Can anyone confirm if we are looking at 2.50.1 for the above change, or is it a 2.51 change now [ which looks to be incoming anyway ].

@anonymouse64
Copy link
Contributor

2.50.1 has the fix and should be in stable now, but yes 2.51 also has the full fix and should be headed to stable next week hopefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants