-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dualtor][mux_simulator] Fix mux simulator stuck #15226
Conversation
Signed-off-by: Longxiang Lyu <[email protected]>
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
Signed-off-by: Longxiang Lyu <[email protected]>
4d468b4
to
0d23f53
Compare
What is the motivation for this PR? Active-standby Dualtor is failing to talk to mux_simulator: # curl -v http://10.64.246.154:8082/mux/vms24-7/24 * Trying 10.64.246.154:8082... on the test server, TCP syn drops are reported increasing: # netstat -s | grep -i listen 1531500 times the listen queue of a socket overflowed 1531501 SYNs to LISTEN sockets dropped mux simulator sync queue is overflowing: # ss -lnt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 129 128 0.0.0.0:8082 0.0.0.0:* It appeared that mux_simulator is stuck in the recvfrom: # strace -p 21315 strace: Process 21315 attached recvfrom(6, and there is no existing TCP connection on the test server/DUT for fd 6. mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow. How did you do it? Enable mux_simulator to work in threaded mode. Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak. How did you verify/test it? Run mux_simulator with the change. Signed-off-by: Longxiang Lyu <[email protected]>
Cherry-pick PR to 202405: #15294 |
What is the motivation for this PR? Active-standby Dualtor is failing to talk to mux_simulator: # curl -v http://10.64.246.154:8082/mux/vms24-7/24 * Trying 10.64.246.154:8082... on the test server, TCP syn drops are reported increasing: # netstat -s | grep -i listen 1531500 times the listen queue of a socket overflowed 1531501 SYNs to LISTEN sockets dropped mux simulator sync queue is overflowing: # ss -lnt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 129 128 0.0.0.0:8082 0.0.0.0:* It appeared that mux_simulator is stuck in the recvfrom: # strace -p 21315 strace: Process 21315 attached recvfrom(6, and there is no existing TCP connection on the test server/DUT for fd 6. mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow. How did you do it? Enable mux_simulator to work in threaded mode. Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak. How did you verify/test it? Run mux_simulator with the change. Signed-off-by: Longxiang Lyu <[email protected]>
What is the motivation for this PR? Active-standby Dualtor is failing to talk to mux_simulator: # curl -v http://10.64.246.154:8082/mux/vms24-7/24 * Trying 10.64.246.154:8082... on the test server, TCP syn drops are reported increasing: # netstat -s | grep -i listen 1531500 times the listen queue of a socket overflowed 1531501 SYNs to LISTEN sockets dropped mux simulator sync queue is overflowing: # ss -lnt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 129 128 0.0.0.0:8082 0.0.0.0:* It appeared that mux_simulator is stuck in the recvfrom: # strace -p 21315 strace: Process 21315 attached recvfrom(6, and there is no existing TCP connection on the test server/DUT for fd 6. mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow. How did you do it? Enable mux_simulator to work in threaded mode. Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak. How did you verify/test it? Run mux_simulator with the change. Signed-off-by: Longxiang Lyu <[email protected]>
Cherry-pick PR to 202311: #15354 |
What is the motivation for this PR? Active-standby Dualtor is failing to talk to mux_simulator: # curl -v http://10.64.246.154:8082/mux/vms24-7/24 * Trying 10.64.246.154:8082... on the test server, TCP syn drops are reported increasing: # netstat -s | grep -i listen 1531500 times the listen queue of a socket overflowed 1531501 SYNs to LISTEN sockets dropped mux simulator sync queue is overflowing: # ss -lnt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 129 128 0.0.0.0:8082 0.0.0.0:* It appeared that mux_simulator is stuck in the recvfrom: # strace -p 21315 strace: Process 21315 attached recvfrom(6, and there is no existing TCP connection on the test server/DUT for fd 6. mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow. How did you do it? Enable mux_simulator to work in threaded mode. Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak. How did you verify/test it? Run mux_simulator with the change. Signed-off-by: Longxiang Lyu <[email protected]>
What is the motivation for this PR? Active-standby Dualtor is failing to talk to mux_simulator: # curl -v http://10.64.246.154:8082/mux/vms24-7/24 * Trying 10.64.246.154:8082... on the test server, TCP syn drops are reported increasing: # netstat -s | grep -i listen 1531500 times the listen queue of a socket overflowed 1531501 SYNs to LISTEN sockets dropped mux simulator sync queue is overflowing: # ss -lnt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 129 128 0.0.0.0:8082 0.0.0.0:* It appeared that mux_simulator is stuck in the recvfrom: # strace -p 21315 strace: Process 21315 attached recvfrom(6, and there is no existing TCP connection on the test server/DUT for fd 6. mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow. How did you do it? Enable mux_simulator to work in threaded mode. Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak. How did you verify/test it? Run mux_simulator with the change. Signed-off-by: Longxiang Lyu <[email protected]>
What is the motivation for this PR? Active-standby Dualtor is failing to talk to mux_simulator: # curl -v http://10.64.246.154:8082/mux/vms24-7/24 * Trying 10.64.246.154:8082... on the test server, TCP syn drops are reported increasing: # netstat -s | grep -i listen 1531500 times the listen queue of a socket overflowed 1531501 SYNs to LISTEN sockets dropped mux simulator sync queue is overflowing: # ss -lnt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 129 128 0.0.0.0:8082 0.0.0.0:* It appeared that mux_simulator is stuck in the recvfrom: # strace -p 21315 strace: Process 21315 attached recvfrom(6, and there is no existing TCP connection on the test server/DUT for fd 6. mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow. How did you do it? Enable mux_simulator to work in threaded mode. Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak. How did you verify/test it? Run mux_simulator with the change. Signed-off-by: Longxiang Lyu <[email protected]>
Description of PR
Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
Active-standby Dualtor is failing to talk to
mux_simulator
:mux_simulator
is stuck in therecvfrom
:mux_simulator
is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow.How did you do it?
mux_simulator
to work in threaded mode.recvfrom
like this, this will ensure the work thread exits after 60s, so no resource leak.How did you verify/test it?
Run
mux_simulator
with the change.Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation