Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[action] [PR:15226] [dualtor][mux_simulator] Fix mux simulator stuck #15354

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

mssonicbld
Copy link
Collaborator

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Active-standby Dualtor is failing to talk to mux_simulator:

# curl -v http://10.64.246.154:8082/mux/vms24-7/24
* Trying 10.64.246.154:8082...
  • on the test server, TCP syn drops are reported increasing:
# netstat -s | grep -i listen
 1531500 times the listen queue of a socket overflowed
 1531501 SYNs to LISTEN sockets dropped
  • mux simulator sync queue is overflowing:
# ss -lnt
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 129 128 0.0.0.0:8082 0.0.0.0:*
  • It appeared that mux_simulator is stuck in the recvfrom:
# strace -p 21315
strace: Process 21315 attached
recvfrom(6,
  • and there is no existing TCP connection on the test server/DUT for fd 6.

mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow.

How did you do it?

  1. Enable mux_simulator to work in threaded mode.
  2. Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak.

How did you verify/test it?

Run mux_simulator with the change.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

What is the motivation for this PR?
Active-standby Dualtor is failing to talk to mux_simulator:

# curl -v http://10.64.246.154:8082/mux/vms24-7/24
*   Trying 10.64.246.154:8082...

on the test server, TCP syn drops are reported increasing:
# netstat -s | grep -i listen
    1531500 times the listen queue of a socket overflowed
    1531501 SYNs to LISTEN sockets dropped

mux simulator sync queue is overflowing:
# ss -lnt
State                     Recv-Q                     Send-Q                                          Local Address:Port                                         Peer Address:Port
LISTEN                    129                          128                                                   0.0.0.0:8082                                              0.0.0.0:*

It appeared that mux_simulator is stuck in the recvfrom:
# strace -p 21315
strace: Process 21315 attached
recvfrom(6,

and there is no existing TCP connection on the test server/DUT for fd 6.
mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow.

How did you do it?
Enable mux_simulator to work in threaded mode.
Set socket timeout to 60s, if a worker thread stucks in the recvfrom like this, this will ensure the work thread exits after 60s, so no resource leak.

How did you verify/test it?
Run mux_simulator with the change.

Signed-off-by: Longxiang Lyu <[email protected]>
@mssonicbld
Copy link
Collaborator Author

Original PR: #15226

@mssonicbld mssonicbld merged commit 63d9649 into sonic-net:202311 Nov 5, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants