Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only send pong if OPEN. #1429

Conversation

dqh-au
Copy link

@dqh-au dqh-au commented Jan 9, 2024

If the server sends a message and then closes the connection, it's possible for the client to fail to recv() the message due to an attempt to send a pong message during the CLOSING state. The following is printed:

ERROR:websockets.client:parser failed
Traceback (most recent call last):
  File "/tmp/throwaway-txYF/lib/python3.10/site-packages/websockets/protocol.py", line 550, in parse
    self.recv_frame(frame)
  File "/tmp/throwaway-txYF/lib/python3.10/site-packages/websockets/protocol.py", line 634, in recv_frame
    self.send_frame(pong_frame)
  File "/tmp/throwaway-txYF/lib/python3.10/site-packages/websockets/protocol.py", line 693, in send_frame
    raise InvalidState(
websockets.exceptions.InvalidState: cannot write to a WebSocket in the CLOSING state

I found that modifying websockets/protocol.py to only send a pong message during the OPEN state allowed me to .recv() the message in this case.

@dqh-au dqh-au force-pushed the dqh/fix-attempt-to-send-pong-while-closing branch from 6fb32fa to e264d6a Compare January 9, 2024 23:19
@aaugustin
Copy link
Member

Thank you for reporting this bug!

The legacy implementation includes this logic, suggesting that this is the right fix in the Sans-I/O layer. I'd like to double check and add a test before merging.

If you're able to reproduce easily, would you be able to provide debug logs by any chance? That would help me understand exactly the scenario in which this happens. In short, you can enable debug logs with:

import logging
logging.basicConfig(format="%(asctime)s %(message)s", level=logging.DEBUG,

@aaugustin aaugustin added the bug label Jan 10, 2024
@cl-dqh
Copy link

cl-dqh commented Jan 11, 2024

So I've realised that this is just an exception being logged by websockets within the context of recv() - the caller still receives the expected message.

I can replicate this reliably in the REPL, relevant logs as follows:

>>> r = n.record(stop_after_seconds=1)
2024-01-11 11:26:40,673 = connection is CONNECTING
2024-01-11 11:26:40,673 > GET /ws/recording HTTP/1.1
2024-01-11 11:26:40,673 > Host: localhost:1024
2024-01-11 11:26:40,673 > Upgrade: websocket
2024-01-11 11:26:40,673 > Connection: Upgrade
2024-01-11 11:26:40,673 > Sec-WebSocket-Key: sZdHgvpcSEDx5Xb1TrkZhA==
2024-01-11 11:26:40,673 > Sec-WebSocket-Version: 13
2024-01-11 11:26:40,673 > Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
2024-01-11 11:26:40,673 > User-Agent: Python/3.10 websockets/12.0
2024-01-11 11:26:40,674 < HTTP/1.1 101 Switching Protocols
2024-01-11 11:26:40,674 < Server: TornadoServer/6.4
2024-01-11 11:26:40,674 < Date: Thu, 11 Jan 2024 00:26:40 GMT
2024-01-11 11:26:40,674 < Upgrade: websocket
2024-01-11 11:26:40,674 < Connection: Upgrade
2024-01-11 11:26:40,674 < Sec-Websocket-Accept: uIQSgzo459s7j6C8XsEhagwnPgU=
2024-01-11 11:26:40,674 = connection is OPEN
2024-01-11 11:26:40,674 > TEXT '{"action": "start", "stop_after_seconds": 1}' [44 bytes]
2024-01-11 11:26:40,674 Waiting for status: started
2024-01-11 11:26:40,992 < TEXT '{"file": {"name": "2024-01-11.00-26-40.676136.r...9, "status": "started"}' [186 bytes]
2024-01-11 11:26:40,992 Recieved recording message: {"file": {"name": "2024-01-11.00-26-40.676136.recording.h5", "location": "/tmp", "path": "/tmp/2024-01-11.00-26-40.676136.recording.h5"}, "start_timestamp": 7154849, "status": "started"}
2024-01-11 11:26:40,992 Status is: started
>>> 2024-01-11 11:26:41,992 < TEXT '{"status": "stopped"}' [21 bytes]

>>> 
>>> r.wait_until_stopped()
2024-01-11 11:27:18,364 Waiting for status: stopped
2024-01-11 11:27:18,364 Recieved recording message: {"status": "stopped"}
2024-01-11 11:27:18,364 Status is: stopped
2024-01-11 11:27:18,364 > CLOSE 1000 (OK) [2 bytes]
2024-01-11 11:27:18,365 = connection is CLOSING
2024-01-11 11:27:18,365 < PING '' [0 bytes]
2024-01-11 11:27:18,365 parser failed
Traceback (most recent call last):
  File "/tmp/throwaway-oOZF/lib/python3.10/site-packages/websockets/protocol.py", line 550, in parse
    self.recv_frame(frame)
  File "/tmp/throwaway-oOZF/lib/python3.10/site-packages/websockets/protocol.py", line 634, in recv_frame
    self.send_frame(pong_frame)
  File "/tmp/throwaway-oOZF/lib/python3.10/site-packages/websockets/protocol.py", line 693, in send_frame
    raise InvalidState(
websockets.exceptions.InvalidState: cannot write to a WebSocket in the CLOSING state
2024-01-11 11:27:18,365 < EOF
2024-01-11 11:27:18,365 > EOF
2024-01-11 11:27:18,365 = connection is CLOSED

The logic sequence is as follows:

  1. Client opens connection to server and sends 'start' request, starts waiting for 'started' response
  2. Server receives start request and responses with started
  3. Client receives 'started' and continues to other tasks
  4. 1 second later, server sends 'stopped' and closes the connection
  5. Some time after that, client starts waiting for 'stopped' response
  6. Client receives and processes 'stopped' response - in the context of recv(), an exception is logged.

EDIT: formatting
EDIT2: clarity

@dqh-au
Copy link
Author

dqh-au commented Jan 11, 2024

I discovered that I wasn't closing the websocket server-side after sending the 'stopped' message - I was just dereferencing it. Fixing that seemed to fix the client behaviour. That said, I still think the client shouldn't attempt to send a pong message if it is not in the OPEN state.

@aaugustin
Copy link
Member

Thank you for the details - that's very useful!

@aaugustin
Copy link
Member

The commit message for b2a95c4 suggests that the current behavior of the Sans-I/O layer is intentional. Mmm.

@aaugustin
Copy link
Member

For clarity, the problem happens in the following scenario:

  • Protocol.send_close() sends a Close frame and changes the connection state is CLOSING
  • Protocol.receive_data() receives data containing a Ping frame — this is a race condition between us sending the Close and the other side sending the Ping
    • Protocol.parse calls Protocol.recv_frame(frame), where frame is the Ping frame
    • Protocol.recv_frame calls Protocol.send_frame(pong_frame) to answer the ping
    • Protocol.send_frame crashes because the connection is CLOSING

This can happen either on the client side or the server side.

@aaugustin
Copy link
Member

Above I described the scenario where the connection state is CLOSING because we initiated the closing handshake.

The connection could also be closing because the other side initiated the closing handshake. However, in that case, websockets will never read a Ping frame after a Close frame — because it stops reading entirely after a Close frame:

# 1.4. Closing Handshake: "after receiving a control frame
# indicating the connection should be closed, a peer discards
# any further data received."
self.parser = self.discard()
next(self.parser) # start coroutine

@aaugustin
Copy link
Member

RFC 6455 says:

Upon receipt of a Ping frame, an endpoint MUST send a Pong frame in
response, unless it already received a Close frame.

Note that it doesn't say "... unless it already sent a Close frame".

Also, it says:

The application MUST NOT send any more data frames after sending a
Close frame.

This implies that it's OK to send more control frames, such as Pong frames.


(Since Ping/Pong are primarily a keepalive / heartbeat mechanism, it doesn't seem very useful to send a Pong after a Close. In fact, on the other side of the connection, websockets will ignore the Pong in this scenario — as mentioned in the previous comment it stops reading data frames and control frames after a Close frame. But, hey, it's legal.)


Also, there's comments in the current test suite pointing out that the current behavior doesn't match the spec.

All in all, I'm leaning towards following the spec more closely, meaning allowing sending ping and pong after close.

@aaugustin
Copy link
Member

Superseded by #1435.

@aaugustin aaugustin closed this Jan 27, 2024
aaugustin added a commit that referenced this pull request Jan 27, 2024
@aaugustin
Copy link
Member

In an earlier comment, you said:

I can replicate this reliably in the REPL

If you could double-check that #1435 fixes the problem for you, that would be most useful :-)

aaugustin added a commit that referenced this pull request Jan 27, 2024
@dqh-au
Copy link
Author

dqh-au commented Apr 20, 2024

Sorry for delay - I just confirmed that your fix works for my test case. Thanks! Any feel for when it might make it into a release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants