Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Recover from Sudden Power Loss and Reopen Custom WSL 2 Gentoo Distribution Installation After Recent Windows 11 Pro 23H2 Updates #11661

Closed
1 of 2 tasks
RandomDSdevel opened this issue Jun 6, 2024 · 24 comments

Comments

@RandomDSdevel
Copy link

RandomDSdevel commented Jun 6, 2024

Windows Versions

  • Windows 11 Pro 23H2, build 22631.3672 ('2024-05 Cumulative Update Preview') ('Microsoft Windows [Version 10.0.22631.3672')
  • Windows 11 Pro 23H2, build 22631.3737 ('2024-06 Cumulative Update') ('Microsoft Windows [Version 10.0.22631.3737]')

WSL Versions

  • 2.1.5.0
  • 2.2.4.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Versions

  • 5.15.146.1-2 (Under WSL v2.1.5.0)
  • 5.15.153.1-2 (Under WSL v2.2.4.0)

Distribution Version

Gentoo (custom installation set up quite some time ago using this guide)

(I'd check for more version information inside a running distribution session, but, per this issue, none will launch.)

Other Software

Nothing applicable that I can think of.

Steps to Reproduce

  1. Be running Windows 11 Pro 23H2, build 22631.3593 ('2024-05 Cumulative Update.') (Or earlier?0

  2. Have an existing custom Gentoo WSL 2 distribution set up like in this guide.

         For the purposes of this section, this has presumably simply been named 'Gentoo;' adjust other instructions if/as needed.

  3. Set 'experimental.sparseVhd' to 'true' in your '.wslconfig.'

  4. In an administrator PowerShell session, run:

    wsl --manage Gentoo --set-sparse true
    
  5. Enable NTFS compression on the custom WSL 2 Gentoo distribution's backing VHDX image file. (For me, this is at '%LocalAppData%\WSL\Gentoo\ext4.vhdx.')

  6. Have at least one terminal session instance running for this custom WSL 2 Gentoo distribution.

  7. Have your system running at high load.

  8. Experience a Windows-side system-wide crash which leaves you with a completely frozen system or a blank or black screen which requires you to do a forced hard restart of your system.

         If you're lucky, you won't encounter the next step, but it can still happen.

  9. Attempt to launch the WSL 2 distribution in question using any method. (Opening using either a direct invocation of the relevant Windows Terminal profile or by running wsl or wsl --distribution Gentoo from inside a PowerShell session.)

Expected Behavior

The custom Gentoo installation should launch again, bringing me to a 'bash' prompt.

Actual Behavior

Distribution launch hangs indefinitely.

Diagnostic Logs

WslLogs-2024-06-05_20-38-59.zip


  1. This started happening some time after my recent update of Windows 11 Pro 23H2 to the 2024-05 Cumulative Update Preview (build 22631.3672.)
  2. I also re-compacted my custom Gentoo installation's VHDX file recently, but I remember it still working after that but before updating Windows. I can't remember for sure if the distribution worked at any point after installing that Windows update; sorry.
Copy link

github-actions bot commented Jun 6, 2024

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

Diagnostic information
.wslconfig found
Detected appx version: 2.2.4.0

@RandomDSdevel
Copy link
Author

     I was on WSL v2.1.5.0 before I updated WSL but was seeing the same behavior under that prior version. I updated to pre-release v2.2.4.0 because I saw this in its release notes for pre-release v2.2.1:

  • Fix hang when the guest crashes during distro initialization

and was wondering if that was related and might help with my issue.


     'WSLService' was stuck on 'Stopping' under the older version after I tried running 'wsl --terminate' at one point; I'm unsure if that's related or not.

@RandomDSdevel
Copy link
Author

RandomDSdevel commented Jun 6, 2024

     Running 'Test-VHD -Path' on the distribution's underlying backing VHDX file returns 'Ture,' but I'm unclear whether that means true as in 'the virtual hard disk is usable/not corrupted' or true as in 'there's an error;' the command's documentation doesn't say. (I submitted a piece of feedback about this using the button at the bottom of the page.)

@RandomDSdevel
Copy link
Author

     I should also mention that this issue persisted across a full reboot of my system.

@RandomDSdevel
Copy link
Author

RandomDSdevel commented Jun 6, 2024

     Trying to run 'wsl --terminate' also hangs/stalls indefinitely, with no error produced.

@RandomDSdevel
Copy link
Author

    Searching around me led me to #10866, which has some helpful discussion in it.

Next troubleshooting steps:

  • In an administrator PowerShell session, run:

    wsl --distribution Gentoo --system --debug-shell
    

    to get a WSL2 debug shell for my custom Gentoo distribution's corresponding system distribution. In that debug shell:

    • Run:

      ls -aHhl /mnt
      findmnt
      

      to list mounted disks/drives, devices, and device files.

      1. If '/mnt/c' is in either or both of those lists, then run this script.
      2. Otherwise, run this script.
  • Try launching WSL2 with 'safeMode' set to 'true' in the '[wsl2]' section of my '.wslconfig' file.

  • Try launching WSL2 after removing NTFS compression from my custom WSL2 Gentoo distribution's backing VHDX file. (Perhaps also do the additional troubleshooting step of toggling sparse mode on said VHDX file.)

@RandomDSdevel
Copy link
Author

     Trying to run 'wsl --distribution Gentoo --system --debug-shell' also hangs/stalls indefinitely, so I can't use that.

@RandomDSdevel
Copy link
Author

RandomDSdevel commented Jun 8, 2024

  1. I also re-compacted my custom Gentoo installation's VHDX file recently, but I remember it still working after that but before updating Windows. I can't remember for sure if the distribution worked at any point after installing that Windows update; sorry.

     My usual procedure for re-compacting my custom WSL2 Gentoo distribution's backing VHDX file is to open an administrator PowerShell session and run:

cd .\AppData\Local\WSL\Gentoo
wsl --terminate Gentoo && wsl --shutdown && compact /u .\ext4.vhdx && wsl --terminate Gentoo && wsl --shutdown && wsl --manage Gentoo --set-sparse false && wsl --terminate Gentoo && wsl --shutdown && Mount-Vhd -Path .\ext4.vhdx -ReadOnly && Optimize-Vhd .\ext4.vhdx -Mode Full && Dismount-Vhd .\ext4.vhdx && wsl --terminate Gentoo && wsl --shutdown && wsl --manage Gentoo --set-sparse true && wsl --terminate Gentoo && wsl --shutdown && compact /c .\ext4.vhdx
  1. 'WSLService' was stuck on 'Stopping' under the older version after I tried running 'wsl --terminate'

    This has continued to be the case. 'wsl --shutdown' works if I haven't tried launching my custom Gentoo distribution first, but it doesn't work if I have. I can run:

    taskkill /F /PID $WSLService_PID
    Stop-Service LxssManager
    

    instead to work around this issue.

  2. Multiple attempts to run 'compact /u .\ext4.vhdx' have currently been giving me this error:

     Uncompressing files in C:\Users\zadmin\AppData\Local\WSL\Gentoo\
    
    ext4.vhdx [ERR]
    ext4.vhdx: The requested operation could not be completed due to a file system limitation
    
    0 files within 1 directories were uncompressed.
    

    I've never seen this happen before.

  3. Toggling whether my custom WSL2 Gentoo distribution's backing VHDX file is sparse or not (using 'wsl --manage Gentoo --set-sparse <true|false>') doesn't change this.

  4. Since I can't remove NTFS compression from my custom WSL2 Gentoo distribution's backing VHDX file, I can't try seeing if running:

    1. 'Optimize-Vhd' to request Hyper-V to defragment it for me or
    2. these steps to attempt to defragment it manually.

    Similarly, this means I can't mount the VHDX file using 'Mount-Vhd' right now.

@RandomDSdevel
Copy link
Author

     Attempting to launch my custom WSL2 Gentoo distribution under WSL2 'safeMode' isn't working.

@withinboredom
Copy link

If it is any solace, I've been dealing with numerous issues since installing that update, mostly:

  • mirrored networking randomly failing
  • memory reclamation causing systemd to hang and stop reaping processes
  • usbip failing to connect properly

I'm nearly giving up at this point as I can't work like this.

@RandomDSdevel
Copy link
Author

RandomDSdevel commented Jun 8, 2024

(Aside:)

systemd

     I use OpenRC with the Gentoo systemd-utils compatibility layer.

@RandomDSdevel
Copy link
Author

@withinboredom:

     Have you reported any of your issues, either together or separately?
     I see you've reported your first issue as #11672.

@RandomDSdevel
Copy link
Author

     This issue persists after I've installed the 2024-05 Cumulative Update Preview, updating to Windows 11 Pro 23H2, build 22631.3737.

@RandomDSdevel RandomDSdevel changed the title Custom WSL 2 Gentoo Distribution Installation Fails to Launch After Updating Windows 11 Pro 23H2 with the 2024-05 Cumulative Update Preview Custom WSL 2 Gentoo Distribution Installation Fails to Launch After Recent Windows 11 Pro 23H2 Updates Jun 12, 2024
Copy link

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

Diagnostic information
.wslconfig found
Detected appx version: 2.2.4.0

@RandomDSdevel
Copy link
Author

  1. I also re-compacted my custom Gentoo installation's VHDX file recently, but I remember it still working after that but before updating Windows. I can't remember for sure if the distribution worked at any point after installing that Windows update; sorry.

     My usual procedure for re-compacting my custom WSL2 Gentoo distribution's backing VHDX file is to open an administrator PowerShell session and run:

cd .\AppData\Local\WSL\Gentoo
wsl --terminate Gentoo && wsl --shutdown && compact /u .\ext4.vhdx && wsl --terminate Gentoo && wsl --shutdown && wsl --manage Gentoo --set-sparse false && wsl --terminate Gentoo && wsl --shutdown && Mount-Vhd -Path .\ext4.vhdx -ReadOnly && Optimize-Vhd .\ext4.vhdx -Mode Full && Dismount-Vhd .\ext4.vhdx && wsl --terminate Gentoo && wsl --shutdown && wsl --manage Gentoo --set-sparse true && wsl --terminate Gentoo && wsl --shutdown && compact /c .\ext4.vhdx

  1. Multiple attempts to run 'compact /u .\ext4.vhdx' have currently been giving me this error:

     Uncompressing files in C:\Users\zadmin\AppData\Local\WSL\Gentoo\
    
    ext4.vhdx [ERR]
    ext4.vhdx: The requested operation could not be completed due to a file system limitation
    
    0 files within 1 directories were uncompressed.
    

    I've never seen this happen before.

  2. Toggling whether my custom WSL2 Gentoo distribution's backing VHDX file is sparse or not (using 'wsl --manage Gentoo --set-sparse <true|false>') doesn't change this.

  3. Since I can't remove NTFS compression from my custom WSL2 Gentoo distribution's backing VHDX file, I can't try seeing if running:

    1. 'Optimize-Vhd' to request Hyper-V to defragment it for me or
    2. these steps to attempt to defragment it manually.

    Similarly, this means I can't mount the VHDX file using 'Mount-Vhd' right now.

     Where can I find a full list of conditions that might prevent the removal of NTFS compression from a file? I've looked around for one some, but I haven't found one yet.


     Starting WSL 2 with:

debugConsole = true

set in my '.wslconfig' (under its '[wsl2]' section) does spawn gobs of 'dmesg' output, but I don't (yet) know how I might be able to capture that from Windows's side.

@RandomDSdevel
Copy link
Author

     Where can I find a full list of conditions that might prevent the removal of NTFS compression from a file? I've looked around for one some, but I haven't found one yet.

     The possible conditions for this that I know about so far include when:

  • The VHDX image is sparse.

         (I don't think this actually matters. For mounting a VHDX image on the Windows side using 'Mount-Vhd' or compacting it using 'Optimize-Vhd?' Yes. For running 'compact /u' on that VHDX image's backing file? I've been able to toggle the disk image's sparseness either before or after uncompressing it before; it just isn't working now whether the VHDX image file is sparse or not. It must be something else, then.)

  • (If the Windows Virtual Machine Platform, and by extension WSL2, supports Hyper-V checkpoints:)

    The VHDX image has outstanding Hyper-V checkpoints.

         Checking this requires that either Hyper-V is or I'm able to mount the VHDX image; this still isn't the case right now. Is there any way to check a VHDX file for Hyper-V checkpoints (and remove them) without mounting it?

  • The VHDX image has outstanding shadow copies.

    • Similarly, checking this requires that I'm able to mount the VHDX image from the Windows side; this still isn't the case. Is there any way to check a VHDX file's contained image for volume shadow copies (and remove them) without mounting it?
    • I'm not sure that this applies to a Linux guest and might be surprised if it does?

@RandomDSdevel
Copy link
Author

     This documentation on how to repair a WSL2 VHD(X) mounting error might be helpful to try, perhaps?


     Hmmm. Maybe it's not a recent Windows update that's the root cause of my issue here. I remember having to force a hard restart after a (Windows) host system crash. (See also the above.)

@RandomDSdevel
Copy link
Author

RandomDSdevel commented Jun 15, 2024

     This documentation on how to repair a WSL2 VHD(X) mounting error might be helpful to try, perhaps?

     I set a spare WSL distribution instance of another Linux distribution up and have had 'e2fsck' running over my custom Gentoo distribution's backing VHDX file since later on yesterday. It's still going. I'll report back on whether that worked or not when it's done.

@RandomDSdevel
Copy link
Author

RandomDSdevel commented Jun 21, 2024

     Starting WSL 2 with:

debugConsole = true

set in my '.wslconfig' (under its '[wsl2]' section)…

     Starting WSL 2 with:

kernelCommandLine = "single"

isn't any help, either.

@RandomDSdevel
Copy link
Author

This documentation on how to repair a WSL2 VHD(X) mounting error might be helpful to try, perhaps?

     I set a spare WSL distribution instance of another Linux distribution up and have had 'e2fsck' running over my custom Gentoo distribution's backing VHDX file since later on yesterday. It's still going. I'll report back on whether that worked or not when it's done.

     I've gotten stuck making progress here. What follows are the further steps I've continued to attempt so far; I:

  • Started following the steps from the relevant section from Microsoft's 'How to manage WSL disk space' documentation, 'How to repair a VHD mounting error:'

    • Ran the following command in an administrator PowerShell session:

      cd ~\AppData\Local\WSL\Gentoo\
      wsl --install --distribution Debian # You can use any Linux distribution for this, of course, but Debian is what I picked.  
      
    • In the resulting WSL 2 terminal session:

      • Ran initial distribution setup as prompted, giving my initial, administrator account a user name and password.
      • Exited the terminal session.
    • Set a new Windows Terminal profile up for the new Debian distribution since this didn't happen automatically.

    • Ran the following command in the same administrator PowerShell session from before:

      wsl --shutdown
      
    • Started a new Windows Terminal Debian session.

    • Ran the following command back in the same administrator PowerShell session I'd been using:

      wsl --mount .\ext4.vhdx --vhd --bare
      
    • Ran the following commands in the open Windows Terminal Debian session from before:

      …~$ lsblk
      NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
      sda    8:0    0 388.6M  1 disk
      sdb    8:16   0     8G  0 disk [SWAP]
      sdc    8:32   0     1T  0 disk /mnt/wslg/distro
                                     /
      sdd    8:48   0   256G  0 disk
      …~$ sudo -i # (Supplying my in-distribution account password when prompted.)  
      …~# e2fsck -fptv -C 0 -E discard /dev/sdd # ← I'm stuck here.  
      
  • The last command starts executing just fine but then gets to a certain point shortly into the process and starts hanging in uninterruptible sleep.

  • Completely force-terminating the containing WSL 2 Debian distribution instance and WSL 2 itself, then restarting WSL, definitely isn't the best idea, but it allowed me to attempt retrying 'e2fsck' as invoked above.

  • Attempts to retry 'e2fsck' as invoked above reproduced the same results.

         I even got a message in its output that it couldn't find /dev/sdd's associated unmounted filesystem's '/lost+found' directory even though the first attempt and other previous attempts kept saying it'd been created.

  • Installing sg3-utils and using that didn't reveal any useful debugging information, though I did see that '/dev/sdd' was getting stuck with an invalid SCSI CDB buffer.

  • Peeking at 'dmesg' logs revealed the same spamming of what must be a SCSI subsystem/interface error message of some kind in it that I saw when I tried launching the custom WSL 2 Gentoo instance of mine that I'm trying to repair to resolve this issue with 'wsl2.debugConsole' set to 'true' in my '.wslconfig.'

  • I'd collect more logs to provide, but I don't want to risk possibly corrupting my custom WSL 2 Gentoo distribution's filesystem image any further.

@RandomDSdevel RandomDSdevel changed the title Custom WSL 2 Gentoo Distribution Installation Fails to Launch After Recent Windows 11 Pro 23H2 Updates Unable to Recover from Sudden Power Loss and Reopen Custom WSL 2 Gentoo Distribution Installation After Recent Windows 11 Pro 23H2 Updates Jun 24, 2024
@RandomDSdevel
Copy link
Author

     I've updated this issue's OP with additional, more detailed reproduction instructions.

Copy link

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

Diagnostic information
.wslconfig found
Detected appx version: 2.2.4.0

@RandomDSdevel
Copy link
Author

     …Huh, the rare helpful wild bot comment appears. Thanks for pointing me at #10873, @github-actions! This comment there in particular pointed out this potentially useful further direction for me to look into:

Thankfully I've been able to recover the system image with the following command: wsl --import new_distro .\new_distro\ <path to ext4.vhdx> --vhd

Some complications, though:

  1. I don't have enough free disk space to export a copy of my custom WSL 2 Gentoo distribution's current VHDX image file to.

  2. There isn't currently a way to unregister a distribution from WSL without also completely deleting/removing its backing VHDX image file. (Cf. at least some discussion in wsl --unregister <Distro> should warn users that their data will be permanently deleted. #9932.)

  3. There isn't any 'wsl --export-in-place' command I can run.

         (Admittedly, though, having such a command might not make much sense and could be at least a bit weird. WSL probably isn't robust against external processes messing with files private to WSL via raw filesystem operations from the Windows side, as opposed to operations communicated over 9p, even when the files belong to a distribution that isn't running; I'd be surprised if it was. Even if a distribution was 'exported' 'in place,' that'd likely be logically equivalent to unregistering a distribution without deleting its underlying backing VHDX image file just to be extra safe, which, again, isn't something that WSL currently supports.)

Thankfully, I have a spare external hard drive meant for another system lying around, so I could just use that.

@RandomDSdevel
Copy link
Author

     I was able to recover my custom WSL 2 Gentoo distribution and its underlying VHDX disk image! Here's what I had to do:

  1. Grabb a spare external hard drive, connect it to my machine, format it as NTFS, and attach it as drive 'E:.'

  2. Back a copy of 'C:\%LocalAppData%\WSL' up to 'E:\Manual Copy.'

  3. Create the folder 'E:\Exported.'

  4. Run the following commands in an administrator PowerShell session:

    wsl --export Gentoo E:\Exported\ext4.vhdx --vhd
    wsl --unregister Gentoo
    wsl --import Gentoo C:\%LocalAppData%\WSL\Gentoo\ E:\Exported\ext4.vhdx --vhd --version 2
    wsl --set-default Gentoo
    wsl --manage Gentoo --set-sparse true
    compact /c .\ext4.vhdx
    
  5. Attempted to open a new Gentoo WSL 2 session; this succeeded!

  6. Removed my backup/debug WSL 2 Debian distribution by running this command in the same administrator PowerShell session from before:

    wsl --unregister Debian
    
  7. Deleted the Windows Terminal Debian profile I had to set up manually before when it didn't get set up automatically.


Closing as:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants