Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemctl initializing #356

Open
CoachYT1 opened this issue Mar 14, 2024 · 23 comments
Open

systemctl initializing #356

CoachYT1 opened this issue Mar 14, 2024 · 23 comments
Labels

Comments

@CoachYT1
Copy link

Describe the issue
After updating to latest ArchWSL systemctl is not working. systemctl status shows initializing

To Reproduce
Update to latest ArchWSL and make a clean installation.

Expected behavior
systemctl should start normally

Screenshots
image

Enviroment:

  • Windows build number: 10.0.22631.3155
  • Security Software: Malwarebytes Premium
  • WSL version 1/2: WSL 2
  • ArchWSL version: 24.3.11.0
  • ArchWSL Installer type: zip
  • Launcher version: 23072600
@9numbernine9
Copy link

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

  • wsl --shutdown to terminate all running WSL instances
  • Add a %USERPROFILE%\.wslconfig file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
  • Wait 10 seconds or so, then restart your Arch WSL.

@CoachYT1
Copy link
Author

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

* `wsl --shutdown` to terminate all running WSL instances

* Add a `%USERPROFILE%\.wslconfig` file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
* Wait 10 seconds or so, then restart your Arch WSL.

image

Same

@xuangeyouneihan
Copy link

xuangeyouneihan commented Mar 21, 2024

I have the same problem, and it does not work for me either 😰
The only thing changed is that Tainted: cgroupsv1 has gone

@xuangeyouneihan
Copy link

Well, I found this and modified .wslconfig according to it, then it worked. But when I renamed .wslconfig to .wslconfig1 without modifying it to enable cgroups v1, Systemd was also working somehow. Then I tried to rename .wslconfig1 back without modifying it to disable cgroups v1, backup the origional ext4.vhdx, unregister ArchWSL, and then re-install it with a new ext4.vhdx, Systemd did not work again. Finally I deleted .wslconfig, and replaced the new ext4.vhdx with the old one, and Systemd works. So why did it work in my old ext4.vhdx, and why didn't it work in a new ext4.vhdx?

@9numbernine9
Copy link

I'm running into this issue as well when setting up an ArchWSL instance on a brand new Windows 10 installation (despite my earlier comments about potential workaround/solutions).

Trying to narrow this down a bit further, I started going back through ArchWSL relesases:

What's odd is that it works fine with the last 2022 release - and not only that, I can bring all the packages up-to-date with pacman -Syu and everything still works fine. I don't know a lot about how WSL distributions are created, but it's something that's changed in the initial configuration/bootstrapping processes between those releases?

C:\> wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.4170

@xuangeyouneihan
Copy link

I'm running into this issue as well when setting up an ArchWSL instance on a brand new Windows 10 installation (despite my earlier comments about potential workaround/solutions).

Trying to narrow this down a bit further, I started going back through ArchWSL relesases:

* [24.3.11.0](https://github.com/yuk7/ArchWSL/releases/tag/24.3.11.0) ❌

* [24.2.24.0](https://github.com/yuk7/ArchWSL/releases/tag/24.2.24.0) ❌

* [22.10.16.0](https://github.com/yuk7/ArchWSL/releases/tag/22.10.16.0) ✔️

What's odd is that it works fine with the last 2022 release - and not only that, I can bring all the packages up-to-date with pacman -Syu and everything still works fine. I don't know a lot about how WSL distributions are created, but it's something that's changed in the initial configuration/bootstrapping processes between those releases?

C:\> wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.4170

Does wayland-0 exist in /run/user/$UID with version 22.10.16.0? I found that wayland-0 is missing in version 24.3.11.0 when Systemd accidentally enabled, see #357

@9numbernine9
Copy link

Does wayland-0 exist in /run/user/$UID with version 22.10.16.0? I found that wayland-0 is missing in version 24.3.11.0 when Systemd accidentally enabled, see #357

No, it doesn't.

@rayae
Copy link

rayae commented Mar 31, 2024

I manually built a rootfs with docker, everything works well.
I think this problem just in the repo's release.
My build script(built with China pacman mirror) create-rootfs.sh
user-dbus-wayland-x11
user-systemctl-status
system-systemctl-status

@mrcaidev
Copy link

mrcaidev commented Mar 31, 2024

None of these solutions work on my side. Only rolling back to version 22.10.16.0 works.

I'm using version 24.3.31.0 on Windows 11.

wsl --version

WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3374

@xuangeyouneihan
Copy link

Does Systemd work normally in v24.3.31.0 released yesterday?

@yuk7
Copy link
Owner

yuk7 commented Apr 1, 2024

@xuangeyouneihan
Sorry, nothing has changed on that front in that release

@xuangeyouneihan
Copy link

@xuangeyouneihan Sorry, nothing has changed on that front in that release

Hope this will be fixed soon 😂
BTW, do you have any idea on what caused this issue?

@WH-2099
Copy link

WH-2099 commented Apr 1, 2024

Does Systemd work normally in v24.3.31.0 released yesterday?

v24.3.31.0 still not work for my environment.

WSL Version: 2.2.1.0
Kernel Version: 5.15.150.1-2
WSLg Version: 1.0.60
MSRDC Version: 1.2.5105
Direct3D Version: 1.611.1-81528511
DXCore Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows Version: 10.0.22635.3420

@yuk7 yuk7 added the bug label Apr 2, 2024
@WH-2099
Copy link

WH-2099 commented Apr 5, 2024

After testing a combination of Arch.exe wsldl.exe rootfs.tar.gz. I now suspect that the problem is mainly related to rootfs.tar.gz and most likely to the systemd-firstboot.service service.
I'm continuing to troubleshoot the problem.

@WH-2099
Copy link

WH-2099 commented Apr 5, 2024

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate.
The systemd boot process with systemd-firstboot.service stuck is the direct cause.

The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl.


Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot.
    a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.


The information I refer to is as follows:
https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html
https://www.freedesktop.org/software/systemd/man/latest/machine-id.html
https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html
https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html
https://learn.microsoft.com/en-us/windows/wsl/systemd

@CoachYT1
Copy link
Author

CoachYT1 commented Apr 5, 2024

image
In my case also systemd-networkd-wait-online.service was blocking the systemd boot process.

@wswind
Copy link

wswind commented Apr 6, 2024

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-fisrtboot.service stuck is the direct cause.

The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl.

Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot.
    a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.

The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

Spelling error should be 'firstboot' instead of 'fisrtboot'

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

image

As I tested, remove /etc/machine-id from rootfs.tar.gz would not fix this issue.

@CnsMaple
Copy link

I manually built a rootfs with docker, everything works well. I think this problem just in the repo's release. My build script(built with China pacman mirror) create-rootfs.sh user-dbus-wayland-x11 user-systemctl-status system-systemctl-status

@rayae Thank you for your script, it's very useful.

@mrcaidev
Copy link

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

This fixed my problem. I'm using v24.4.28.0.

@shanoor
Copy link

shanoor commented Jul 10, 2024

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

* `wsl --shutdown` to terminate all running WSL instances

* Add a `%USERPROFILE%\.wslconfig` file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
* Wait 10 seconds or so, then restart your Arch WSL.

I had an issue with a very long wsl boot and systemd not starting right away (with the infamous Failed to connect to bus: No such file or directory), I had to wait 30s and manually run sudo systemctl start user@1000 every time to get systemd back. Your solution worked for me, it now back to what it was before, it's fast again and working, thanks!

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

I also had to do this to get Docker working again. Thanks!

@WH-2099
Copy link

WH-2099 commented Jul 19, 2024

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-fisrtboot.service stuck is the direct cause.
The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl.
Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot.
    a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.

The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

Spelling error should be 'firstboot' instead of 'fisrtboot'

This is how I fix this issue:

1. Cancel running jobs like systemd-firstboot.service

2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

image

As I tested, remove /etc/machine-id from rootfs.tar.gz would not fix this issue.

thx

@l3n4QAQ
Copy link

l3n4QAQ commented Aug 7, 2024

I'm using v24.4.28.0.

modify ExecStart in /usr/lib/systemd/system/systemd-networkd-wait-online.service.

The new ExecStart should be:
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online -i eth0 --any --timeout=10

restart WSL:
wsl --shutdown

check again:
systemctl status

@kloon15
Copy link

kloon15 commented Aug 12, 2024

Proper workaround here: microsoft/WSL#11857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests