Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd >= 256 needs plain cgroupv2 support #11857

Open
1 of 2 tasks
Vogtinator opened this issue Jul 30, 2024 · 7 comments
Open
1 of 2 tasks

systemd >= 256 needs plain cgroupv2 support #11857

Vogtinator opened this issue Jul 30, 2024 · 7 comments

Comments

@Vogtinator
Copy link

Windows Version

Microsoft Windows [Version 10.0.19045.4291] (11 is affected the same way)

WSL Version

2.2.4.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.153.1-2

Distro Version

openSUSE Tumbleweed

Other Software

No response

Repro Steps

  1. Install any distro with systemd 256 on WSL 2
  2. wsl --shutdown
  3. wsl systemctl is-system-running

Expected Behavior

wsl systemctl is-system-running should immediately report running.

Actual Behavior

wsl systemctl is-system-running fails with an error like Failed to connect to bus: No such file or directory.
Executing wsl systemctl is-system-running again will eventually succeed and return starting and running.

Diagnostic Logs

This is because of systemd/systemd#32998. The host VM uses a "hybrid" cgroupv1 support, which is no longer supported by systemd >= 256 (https://github.com/systemd/systemd/releases/tag/v256-rc3).

As systemd is running in a container here ("wsl"), it warns about this by showing a message for 30s before booting. This triggers the /sbin/init failed to start within a 10000ms timeout warning and the command is executed before systemd is up.

Copy link

Logs are required for review from WSL team

If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'.
Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.

How to collect WSL logs

Download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The script will output the path of the log file once done.

If this is a networking issue, please use collect-networking-logs.ps1, following the instructions here

Once completed please upload the output files to this Github issue.

Click here for more info on logging
If you choose to email these logs instead of attaching to the bug, please send them to [email protected] with the number of the github issue in the subject, and in the message a link to your comment in the github issue and reply with '/emailed-logs'.

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@Vogtinator
Copy link
Author

Vogtinator commented Jul 30, 2024

I can't collect logs from my system right now, so for now I'll just refer to the ones from #11739 which has the same cause: https://github.com/user-attachments/files/16076419/WslLogs-2024-07-02_20-47-17.zip

Copy link

Diagnostic information
Detected appx version: 2.2.4.0

@suiryc
Copy link

suiryc commented Jul 30, 2024

Very interesting.

By enabling the debug console (debugConsole = true in .wslconfig) I can indeed see the warning:

Legacy cgroup v1 support selected. This is no longer supported. Will proceed anyway after 30s.

This matches the fact that I am seeing systemd do nothing during 30s, then finally really start its services and be visible.

There are some other interesting things to notice.
By default, cgroup v1 are used, and we have this issue with systemd not really starting during 30s but WSL doing its login job (too soon). Hence a lot of side effects like /tmpbeing mounted later and messing with other things.
Pointers that cgroups v1 are used in this case:

$ stat -fc %T /sys/fs/cgroup/
tmpfs
$ cat /sys/fs/cgroup/cgroup.controllers
cat: /sys/fs/cgroup/cgroup.controllers: No such file or directory

Now by enabling memory reclaim (in .wslconfig)

[experimental]
autoMemoryReclaim = gradual

Apparently the system is started with cgroup v2.

$ stat -fc %T /sys/fs/cgroup/
cgroup2fs
$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

In this case, systemd starts immediately (without the warning), and everything works as was previously (with cgroup v1) in systemd v255.

Alternatively, as commented in issue #6662 (comment), enabling the following kernel parameters (again in .wslconfig) also make it so that WSL is started with cgroup v2:

[wsl2]
kernelCommandLine = cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1

So there are two ways to have WSL+systemd v256 work correctly by having WSL use cgroup v2.
The question is whether those are to be considered proper solutions, or only workarounds until something more automatic is done in WSL to address this issue.

@Vogtinator
Copy link
Author

So there are two ways to have WSL+systemd v256 work correctly by having WSL use cgroup v2.
The question is whether those are to be considered proper solutions, or only workarounds until something more automatic is done in WSL to address this issue.

Yeah. If there's nothing or almost nothing left that requires cgroupv1, then just switching to cgroupv2 fully by default is the proper solution.

Otherwise it might need some new parameter in wsl.conf that specifies whether a distro needs cgroupv1 or v2. I'm not sure whether it's possible to have a cgroupv1 container next to a cgroupv2 container with cgroup namespaces though.

@WH-2099
Copy link

WH-2099 commented Aug 13, 2024

I got something more.
The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot.
Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

@Vogtinator
Copy link
Author

Vogtinator commented Aug 13, 2024

The linux kernel doesn't do anything with systemd parameters. systemd just looks at /proc/cmdline.

In any case, this is unrelated to the issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants