Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic packets from Windows to WSL interface dissappearing (ROS2 app) #11966

Open
1 of 2 tasks
borjamunozf opened this issue Aug 28, 2024 · 20 comments
Open
1 of 2 tasks

Topic packets from Windows to WSL interface dissappearing (ROS2 app) #11966

borjamunozf opened this issue Aug 28, 2024 · 20 comments
Labels

Comments

@borjamunozf
Copy link

Windows Version

10.0.22631.4037

WSL Version

2.2.24

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.146

Distro Version

Ubuntu 22.04

Other Software

WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.22631.4037

Repro Steps

  • Using CYCLONE_DDS = - rmw_cyclonedds_cpp
  • sudo apt install ros-humble-ros-base

Expected Behavior

Packets are not lost in WSL & performance are the expected (~16Hz)

Actual Behavior

Three developers in my company has encountered strange performance issues in an application. This does not ocurred before without WSL!

The scenario is following:

  • Windows Unity application.
  • WSL 22.04 Ubuntu running ROS2
  • Wireshark analyzing WSL hyper-v adapter from
    From Windows side all looks good, the topic is published every ~6ms like a clock.
    imagen

But from WSL a lot of packages are missing. It's only receiving like 1 update per second on average.

imagen

I have established the /proc/sys/net/core/rmem_max" to higher value = 2203936 and added this to the kernel:
CONFIG_IP_MULTICAST=y

It seems that this has slighly improved performance. However It starts really nice and afterwards it just get bad again.

imagen

Diagnostic Logs

No response

Copy link

Logs are required for review from WSL team

If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'.
Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.

How to collect WSL logs

Download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The script will output the path of the log file once done.

If this is a networking issue, please use collect-networking-logs.ps1, following the instructions here

Once completed please upload the output files to this Github issue.

Click here for more info on logging
If you choose to email these logs instead of attaching to the bug, please send them to [email protected] with the number of the github issue in the subject, and in the message a link to your comment in the github issue and reply with '/emailed-logs'.

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@borjamunozf
Copy link
Author

borjamunozf commented Aug 30, 2024

Interesting update:

  • Setting up the same scenario in two laptop with Windows 10, WSL 2.2.4/2.1.5 and the standard kernel (or the customized one, 5.15.146.1) we get roughly expected performance and packets.

The laptops with expected performance:
Laptop 1

  • Windows 10.0.19045.4780 - WSL 2.1.5
  • 12th Gen Intel(R) Core(TM) i9-12950HX
    Logical CPU cores: 24
    Has AVX512: no
    L1 cache size: 48 KiB
    L2 cache size: 1280 KiB
    L3 cache size: 30 MiB
    L1 cache sharing: 2 threads
    L2 cache sharing: 2 threads
    L3 cache sharing: 24 threads
  • NVIDIA RTX A3000 12GB Laptop GPU

Laptop 2

  • Windows 10 Enterprise 22H2 19045.3570 - WSL 2.2.4
  • Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
    Logical CPU cores: 12
    Has AVX512: no
    L1 cache size: 32 KiB
    L2 cache size: 256 KiB
    L3 cache size: 12 MiB
    L1 cache sharing: 2 threads
    L2 cache sharing: 2 threads
    L3 cache sharing: 12 threads
  • NVIDIA RTX A3000 12GB Laptop GPU

Right now we're considering issues with WSL, Windows 11 and Intel P-E cores as the faulty laptops uses 13th Gen Intel(R) Core(TM) i9-13950HX

Not sure if could be connected to #9190

@CatalinFetoiu
Copy link
Collaborator

thanks for reporting the issue. can you please collecting networking logs using instructions at https://github.com/microsoft/WSL/blob/master/CONTRIBUTING.md#collect-wsl-logs-for-networking-issues

the script will generate a zip with name starting with "WslNetworkingLogs"

@borjamunozf
Copy link
Author

thanks for reporting the issue. can you please collecting networking logs using instructions at https://github.com/microsoft/WSL/blob/master/CONTRIBUTING.md#collect-wsl-logs-for-networking-issues

the script will generate a zip with name starting with "WslNetworkingLogs"

The problem is #11961 In my company all laptops has a specific and different admin user account, so opening a Powershell and running the scripts as admin does not detect the WSL distributions because they're installed in the normal account.

@CatalinFetoiu
Copy link
Collaborator

@borjamunozf thanks for following up. is there a way to install and reproduce the problem on the admin account?

@borjamunozf
Copy link
Author

borjamunozf commented Sep 21, 2024 via email

@borjamunozf
Copy link
Author

borjamunozf commented Sep 24, 2024

Ok, this is the logs we have been able to collect in one laptop. It has been quite tricky to make it work with our company environment and account policy :/ probably some data has been not possible to gather.

  • The laptop in this case is a Windows 11 with 2.2.4 WSL
    13th Gen Intel(R) Core(TM) i9-13950HX

Started application to collect data with the tcpdump, extremely poor performance (not even ~1Hz this time...)

WslNetworkingLogs-2024-09-24_13-58-42.zip

We'll try to get the same logs but in Windows 10 with 2.2.4 WSL and 12th Gen Intel(R) Core(TM) i9-12950HX
that seems to be working well enough.

EDIT: It seems that something it's missing... Will update the logs again as soon as I fix this.

Copy link

No logs.etl found in the archive. Make sure that you ran collect-wsl-logs.ps1 as administrator and that the logs.etl file is in the archive.

Diagnostic information
.wslconfig found
	Custom kernel found: 'C:\Users\tmgfos\bzImage_6.6.4'
Detected appx version: 2.3.17.0
optional-components.txt not found
No logs.etl found in archive.
Error while parsing the logs. See action page for details

@borjamunozf
Copy link
Author

The log size is way higher than 25Mb, 200Mb. How could we forward it to you?

@CatalinFetoiu
Copy link
Collaborator

@borjamunozf , if possible, can you please add the logs.etl file (and the rest of the files in the WslNetworkingLogs zip) in a one drive or google drive link and share it with us?

Thanks

@borjamunozf
Copy link
Author

borjamunozf commented Sep 29, 2024

Here is it, @CatalinFetoiu

WSL Logs

let me know if you have any issue with the access.

@borjamunozf
Copy link
Author

borjamunozf commented Oct 22, 2024

Any news regarding this? Anything we could do to help? Any ideas? It's really critical for a lot of devs to able to locate and fix this problem.

My best guess is that the Intel12/13th in Windows 11 with WSL is the root cause, but what do to is other story.
Thanks

@borjamunozf
Copy link
Author

borjamunozf commented Nov 7, 2024

Update with more tests. It's clear that for some reason the CPU processors are not being used?

Test with disabled core isolation - memory integrity.
No result
Image

  • Test with High performance config plan and disabled core memory integry
    No result
    Image

  • Multiple test with Process Lasso
    Setting vmmemWSL with Induced Performance Mode and the Host processor app
    Setting vmmemWSL with High performance Profile
    Setting vmmemWSL with disabled Hyperthreading CPU affinity...

This is the actual view, 11% processor usage with the tool runining

Image

More differences regarding UDP traffic generated in Windows 10 (OK) vs Windows 11 (disastreous)
So for each packet he gets around 1516*9 + 248 = 13 892 (bits?)
And I get 1514 * 44+422 = 67038 (bits?)

Image

EDITED:

  • Test with networkingMode=mirrored

  • It does not work for some reason, it does not receives any data at all. Tested with firewall=true or false, applying rules but nothing.

  • Test setting up distro and WSL with Windows admin
    Same result
    Image

Image

Any more ideas, anything that we can try?

@borjamunozf
Copy link
Author

borjamunozf commented Nov 8, 2024

More tests:

  • From ROS2 documentation we establish
sysctl -w net.core.rmem_max=4194304
sysctl net.ipv4.ipfrag_time=3
sysctl net.ipv4.ipfrag_high_thresh=134217728

Image

Testing benchmark UDP between Windows 11 to WSL2 interface with iperf

  • It seems ok.

WSL2 side

iperf3 -s

Windows side:

iperf3 -c interfaceWSLIp -u -b 100MB

Image

@borjamunozf
Copy link
Author

borjamunozf commented Nov 8, 2024

MAJOR UPDATE: CC @OneBlue @CatalinFetoiu (sorry for the spam, you'll have much more work than check this thread...)

  • Setting the distro to WSL1 version works fantastic performance wise!
  • We're not sure if besides setting WSL1 distro is enough or any of the multiple changes we have applied.

Image

I'm keen to consider that the HyperV is the responsible for messing this up somehow. Perhaps multicast traffic being dropped or not reaching at all.

One more note:

  • lscpu in WSL1 brings correct output for core per socket (24)

Image

We'll continue doing the tests with Ubuntu - WSL2 and try to stretch the scope of this issue and what's going on....

@borjamunozf
Copy link
Author

Finally, it's working fine in WSL2!

Workaround:

  • Use bridged networkingMode.
  • Create VMSwitch External for your network adapter.

Hypothesis:

  • Problem with multicast traffic with default VMSwitch used by WSL2. It seems that Internal VMswitch wont forward properly multicast given networking isolation/limitations with NAT switch.
  • Theorically, I think it should work with mirrored networkinMode, but it's not.

We'll continue investigation, but it seems clearly a networking problem between multicast WSL2-Windows and not a Intel architecture one.

@Meno12ivanov
Copy link

Windows Version

10.0.22631.4037

WSL Version

2.2.24

Are you using WSL 1 or WSL 2?

  • WSL 2[ ] WSL 1

Kernel Version

5.15.146

Distro Version

Ubuntu 22.04

Other Software

WSL version: 2.2.4.0 Kernel version: 5.15.153.1-2 WSLg version: 1.0.61 MSRDC version: 1.2.5326 Direct3D version: 1.611.1-81528511 DXCore version: 10.0.26091.1-240325-1447.ge-release Windows version: 10.0.22631.4037

Repro Steps

  • Using CYCLONE_DDS = - rmw_cyclonedds_cpp
  • sudo apt install ros-humble-ros-base

Expected Behavior

Packets are not lost in WSL & performance are the expected (~16Hz)

Actual Behavior

Three developers in my company has encountered strange performance issues in an application. This does not ocurred before without WSL!

The scenario is following:

  • Windows Unity application.
  • WSL 22.04 Ubuntu running ROS2
  • Wireshark analyzing WSL hyper-v adapter from
    From Windows side all looks good, the topic is published every ~6ms like a clock.
    imagen

But from WSL a lot of packages are missing. It's only receiving like 1 update per second on average.

imagen

I have established the /proc/sys/net/core/rmem_max" to higher value = 2203936 and added this to the kernel: CONFIG_IP_MULTICAST=y

It seems that this has slighly improved performance. However It starts really nice and afterwards it just get bad again.

imagen

Diagnostic Logs

No response

wsl --install
CMakePresets-schema.json

@Meno12ivanov
Copy link

@sbrl
Copy link

sbrl commented Dec 16, 2024

The official website thats for mirrored mode + WSL you need 22H2 or above?

https://learn.microsoft.com/en-us/windows/wsl/networking#mirrored-mode-networking

@shixudong2020
Copy link

because WSL2(NAT)'s MTU=1420 and Windows host's MTU=1500
if WSL2(NAT) not use VPN,u can safely configure WSL2(NAT)'s MTU=1500,or configure Windows host's MTU=1420
for detail,please see WSL2 and VPN(https://blog.csdn.net/sxd2001/article/details/136788434)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants