-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly mirror all packets (including kernel sockets) for mirrored network mode #10842
Comments
Could you please follow the steps below and attach the diagnostic logs? https://github.com/microsoft/WSL/blob/master/CONTRIBUTING.md#collect-wsl-logs-for-networking-issues |
@chanpreetdhanjal I bet 99% this problem couldn't be found by diagnostic logs, but I will try. |
It is true that kernel sockets cannot be mirrored,temporary solution: |
we have many windows pc installed k8s in wsl2, these k8s connect to the control plane as worker node with taint so that everyone can use some heavy infrastructure in server |
Is your feature request related to a problem? Please describe.
I have investigated this problem for a few weeks now and I realized all the problems I have, they all the following common symptoms:
*: Given two machines A and B, if WSL of A sends a packet to machine B, the packets are delivered to B and captured from Windows Wireshark side, but no captures from the tcpdump inside the WSL of B
In addition, some examples are (Without loss of generality):
And thus, with all the aforementioned clues, and to my educated guesses, because those are all managed by the Linux kernel and not userspace, so if any userspace program works, then this means anything from kernel is clearly missing the mirroring somehow.
This theory is supported with this Wireguard workaround: #10841 (comment)
I noticed there is something interesting with the following nftables rules (WLOG again):
You can't reproduce those rules with normal nft commands, so I guess this is how mirrored mode was implemented at the moment (sadly the technical info on this was not transparent): by a combination of userspace ptrace (to rewrite the socket options and enable mirrored network support transparently), nftables, and Hyper-V networking hacks that zero-copies network packet from loopback device, then masquerade the network traffic to the main interface, all back and forth.
If my theory is correct, that means any kernel-initiated sockets, will end up forever stuck because it somehow bypassed the ptrace or somehow disregarded the nft rules, which means while the packets were delivered, it was missing some important information so that the Hyper-V network side did not realize this port was registered for mirroring from either incoming and outgoing side (and also sadly, not a true zero-copy packet mirror due to the masquerade which relied on stateful conntrack). And in the end the packet was eventually discarded on the Windows side due to missing receiver and timeout in the end.
To further prove this theory, I think IPSec would work if we separated the network connection to be handled by userspace Strongswan solely, and only offload TLS stuff to the kernel (I'm not sure if we can use KTLS that way). Another good proof would be testing whether KSMBD (kernel level Samba) works.
Related items:
#10841
#10840
#10730
Describe the solution you'd like
See if this theory is correct, and then check if anything necessary is missing for such scenarios, and try to implement them...
Describe alternatives you've considered
Shove it down your stomach and accept that mirrored network mode is for userspace only
Additional context
Unsurprisingly, this is a missing feature rather than a bug, because the people from MSFT clearly did not expect this kind of rare applications.
For us, we are running k0s, kube-proxy under IPVS mode, and VXLAN for Calico. We tried switching between IPIP, VXLAN, Wireguard and raw routing mode (that means
ip route ... via <local mirrored interface IP address>
since all of our WSL machines are on the same L2 network) in Calico, none of that worked.With a custom kernel and switching to iptables for kube-proxy mode, it worked somehow, but pod-to-pod communication is still unreachable and the conntrack table is full of 0-length packets under SYN_WAIT somehow.
Rant time:
I'm highly aware that WSL is supposedly designated for a single-user based, semi-ephemeral developer environment, and not for running any critical server application like K8S after all.
However, as we have a huge sunk cost in Windows for our workstations, we cannot afford to switch to Linux, and add insult to the injury we need to use a lot of GPU resources for machine learning, but this is missing on the Windows side. We wanted to combine the best of both world by using kubeflow to manage the idling GPU resources efficiently.
And I'm very sure any MSFT consultants would suggest me to use an actual Hyper-V VM for this purpose, but we cannot afford to have DDA unfortunately for us being a small startup, so until MSFT have official support for GPU-P on Hyper-V VM (heck, even Windows GPU-P support is hidden behind secret Powershell commands and options), our only best bet is to use WSL2 and their official CUDA GPU support...with a bit of hack from me to gpu-operator as well.
We do have an ultimate workaround for all these: network bridge mode. This solves everything mentioned above, except this is recently "deprecated" in favor of the mirrored network mode...
The text was updated successfully, but these errors were encountered: