Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support strict_exception_groups=True #188

Merged
merged 10 commits into from
Sep 21, 2024

Conversation

jakkdl
Copy link
Member

@jakkdl jakkdl commented Jun 6, 2024

Fixes #132 and #187

  • Changes open_websocket to only raise a single exception, even when running under strict_exception_groups=True
    • Should maybe introduce special handling for KeyboardInterrupts
    • If multiple non-Cancelled-exceptions are encountered, then it will raise TrioWebSocketInternalError with the exceptiongroup as its __cause__. This should only be possible if the background task and the user context both raise exceptions. This would previously raise a MultiError with both Exceptions.
    • other alternatives could include throwing out the exception from the background task, raising an ExceptionGroup with both errors, or trying to do something fancy with __cause__ or __context__.
  • WebSocketServer.run and WebSocketServer._handle_connection are the other two places that opens a nursery. I've opted not to change these, since I don't think user code should expect any special exceptions from it, and it seems less obscure that it might contain an internal nursery.
    • Update docstrings to mention existence of internal nursery.

Copy link
Member

@belm0 belm0 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi John-- I may be slow in reviewing.

Please make sure the PR description summarizes the changes (e.g. mentions the internal error, API policy about exception groups, etc.).

Would need test coverage for any changes, but ok to hold off until I've reviewed the basics (as I can't promise we'll take the direction of this PR).

I'm itching to add Black and type checking to the CI, but that should be in a separate PR.

Please see #177. On type checking, I'm ok running a loose mypy pass, but reluctant to convert the codebase to strict type annotations, since fighting the type system is often a burden on maintenance (especially with a project on life support).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thank you, of course I'd complained about lack of Black before 😅

Agree on loose mypy just to check if the annotations that exist are correct. And I don't have any plans on adding annotations everywhere, just doing it for my own sake when developing.

Will try and make PR summary verbose and descriptive 👍

@jakkdl
Copy link
Member Author

jakkdl commented Jun 7, 2024

@belm0 do you have strong opinions on keeping python3.7 support? It's getting problematic to support with lotsa libraries not supporting it. If pinning test requirements to versions that support 3.7 we're not testing anything remotely recent, so would need to generate an additional requirements.txt file, etc etc.

Testing trio versions < 0.22 is also getting messy. If we explicitly want to check for strict_exception_group compatibility the parameter was only introduced in 0.22
I would also love to use trio.testing.RaisesGroup, but at that point the test deps hit trio>=0.24

possible to get them to work, but I'm skeptical if it's worth the effort to increase the complexity of the test infra to get there.

@belm0
Copy link
Member

belm0 commented Jun 9, 2024

@belm0 do you have strong opinions on keeping python3.7 support? It's getting problematic to support with lotsa libraries not supporting it.

I'm ok dropping 3.7 since it's past EOL. (For what it's worth, mainstream packages now think it's ok to drop versions before EOL-- numpy and others dropping 3.8 already.)

Testing trio versions < 0.22 is also getting messy. If we explicitly want to check for strict_exception_group compatibility the parameter was only introduced in 0.22 I would also love to use trio.testing.RaisesGroup, but at that point the test deps hit trio>=0.24

My org uses trio < 0.22 and would like to continue using the latest trio-websocket (otherwise I'm no longer using the package I maintain). My view is that trio's handling of the strict exception group transition is problematic (especially making strict_exception_group a runtime option that can have effect globally on transitive dependencies), so we're just avoiding it.

@jakkdl jakkdl marked this pull request as ready for review June 10, 2024 13:43
@jakkdl
Copy link
Member Author

jakkdl commented Jun 12, 2024

did some messing around:

  • Doing shield = True in close_connection is probably a bad idea when the disconnect_timeout defaults to a full minute. It might be a more proper thing to do, but that'd be a change of behavior that this PR does not care to mess with.
  • I looked at KeyboardInterrupt, and if a user were to cause one while _reader_task happens to be the current task, I think it will cause a TrioWebsocketInternalError. Would need to do signal handling in order to test it, or monkeypatch _reader_task or one of the handlers it uses to trigger this.

tests/test_connection.py Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
accepted_streams: list[
tuple[trio.abc.SendChannel[str], trio.abc.ReceiveChannel[str]]
] = attr.ib(factory=list)
queued_streams = attr.ib(factory=lambda: trio.open_memory_channel[str](1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a mistake? I don't know why the lambda return expression includes a type annotation ([str]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the annotation is for trio.open_memory_channel, specifying the type of the objects being sent over the channels. It's not normally possible to specify the type on functions, so Trio says it's a callable class during type checking to allow for that usage: https://github.com/python-trio/trio/blob/4dcdc6dbe899c9ee295fa9c8e6a077100ae19f16/src/trio/_channel.py#L91

Without the annotation, queued_streams would need the same long & verbose type annotation as accepted_streams.

tests/test_connection.py Show resolved Hide resolved
tests/test_connection.py Outdated Show resolved Hide resolved
Comment on lines 48 to 51
class TrioWebsocketInternalError(Exception):
...


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs docs

If multiple non-Cancelled-exceptions are encountered, then it will raise TrioWebSocketInternalError with the exceptiongroup as its cause. This should only be possible if the background task and the user context both raise exceptions. This would previously raise a MultiError with both Exceptions.

other alternatives could include throwing out the exception from the background task, raising an ExceptionGroup with both errors, or trying to do something fancy with cause or context.

It seems odd to raise an internal error where a user-context exception is implicated. Perhaps we can discard the internal error with a warning message, and propagate the user exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to raise an internal error where a user-context exception is implicated.

The "internalerror" bit would be that we're encountering a state that the internal logic assumed to be impossible - similar to TrioInternalError. I should've added a message to the exception to make that clearer.

Perhaps we can discard the internal error with a warning message, and propagate the user exception.

I quite like that solution. Only possible problem would be if this makes exception handling harder, but it 1. shouldn't happen, 2. they can still handle the user exception. We can also log detailed info on the exception we're suppressing, so the warning can be relatively brief.

trio_websocket/_impl.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebSocketServer.run and WebSocketServer._handle_connection are the other two places that opens a nursery. I've opted not to change these, since I don't think user code should expect any special exceptions from it, and it seems less obscure that it might contain an internal nursery.

In principle I'd like to state that the API doesn't raise exception groups, and that would include the server API. For the user's handler, perhaps add UserRequestHandlerException?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for users that are embracing strict_exception_groups=True and are used to handling exceptiongroups everywhere we shouldn't purposely obscure the groups and make errors harder for them to handle (and Trio's stance is that this is the default and should be embraced). I haven't delved too too deeply into them to see what it would imply to try and avoid exceptiongroups, but I think it might necessitate a change in API where we might end up changing what exceptions are being raised - even for users sticking with strict_exception_groups=False.

…t in CI.

* Adds loose mypy run to Makefile & CI
* Adds some type annotations
* Adds 3.13-dev test run. It currently fails, but will pass once trio
  and cffi push new releases.
* Adds blurb to readme on status of the project.
* Bump package versions in requirements-dev.txt and
  requirements-dev-full.txt
* Adds small tests to slightly bump code coverage
…m within exceptiongroups

revert making close_connection CS shielded, as that would be a behaviour change causing very long stalls with the default timeout of 60s

add comment for pylint disable

move RaisesGroup import
@jakkdl jakkdl force-pushed the strict_exception_groups_support branch from 42f718f to 7eb779b Compare June 17, 2024 10:23
@jakkdl
Copy link
Member Author

jakkdl commented Jun 17, 2024

I rewrote the history with some squashing rebases, so even without merging #189 one can now selectively review this PR by only viewing 7eb779b in the "Files Changed" tab.

@jakkdl
Copy link
Member Author

jakkdl commented Jun 18, 2024

I looked at KeyboardInterrupt, and if a user were to cause one while _reader_task happens to be the current task, I think it will cause a TrioWebsocketInternalError. Would need to do signal handling in order to test it, or monkeypatch _reader_task or one of the handlers it uses to trigger this.

Okay I think I'm finally starting to understand KeyboardInterrupt. If either _reader_task or the user code raises KeyboardInterrupt, causing the other to get Cancelled, then we simply throw away the Cancelled and raise the original KeyboardInterrupt. But the more complex problem is if the user code raises some exception, and before we finish unwinding _reader_task and/or during close_connection we get a KeyboardInterrupt. At this point we definitely shouldn't suppress KI and just raise a warning, in favor of raising the users code. This is a prime example of when it would be so much easier to always raise a group, but I think what we ultimately have to do is raise ki_exception from user_exception.

belm0 pushed a commit that referenced this pull request Jun 18, 2024
Split out from #188 with a `--patch` checkout. This does destroy the
commit history, so I might need to rewrite the history in #188 to avoid
conflicts.

* Drops python3.7 support. It still works afaik, but is a pain to test
in CI.
* Adds loose mypy run to Makefile & CI
* Adds some type annotations
* ~~Adds 3.13-dev test run. It currently fails, but will pass once trio
and cffi push new releases.~~
* Adds blurb to readme on status of the project.
* Bump package versions in requirements-dev.txt and
requirements-dev-full.txt
* Adds small tests to slightly bump code coverage
@jakkdl
Copy link
Member Author

jakkdl commented Jun 19, 2024

Big rewrite, fairly happy with the state of things - except:

  1. the multiple added tests are very similar and could probably have common code moved into helpers/be parametrized/etc.
  2. I had trouble engineering DisconnectionTimeouts, so there's no tests for exception chaining into that. I could of course monkeypatch that to happen though.

debugging old_deps is nooooo fun when the only exception being surfaced to me is errors inside TracebackException. 🙃

@jakkdl jakkdl requested a review from belm0 August 31, 2024 11:34
requirements-extras.in Outdated Show resolved Hide resolved
trio_websocket/_impl.py Outdated Show resolved Hide resolved
Comment on lines 184 to 185
connection: WebSocketConnection|None=None
result2: outcome.Maybe[None] | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot if we discussed already, but I think Optional[Foo] is more friendly / readable than Foo | None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think |None is the standard way of doing it nowadays, and pyupgrade/ruff will warn on doing Optional[...]: https://docs.astral.sh/ruff/rules/non-pep604-annotation/

I also think it's nice for newbies not to have another keyword to know, and highly personally I prefer not having to do another import for it.

trio_websocket/_impl.py Outdated Show resolved Hide resolved
tests/test_connection.py Outdated Show resolved Hide resolved
tests/test_connection.py Outdated Show resolved Hide resolved
# After opening the connection, we spawn _reader_task in the background and
# yield to user code. If only one of those raise a non-cancelled exception
# we will raise that non-cancelled exception.
# If we get multiple cancelled, we raise the user's cancelled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple Cancelled was an existing possible case, so do we need to try hard to make it a single exception? (Cancelled is typically caught by trio itself, and otherwise the user had to deal with MultiError Cancelled anyway.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it's a very easy gotcha to write

try:
  ...
except Cancelled:
  ...

which might work 99% of the time, but then at some random point you get multiple instead of a single (or the other way around) Cancelled, and your exception handler is now broken. I agree that Cancelled is perhaps a special case due to trio handling it, but if it does surface to the user I'm not sure there's much to be gained from having them see multiple of them.

Copy link
Member

@belm0 belm0 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiError with Cancelled can happen fairly easily, and we quickly learned never to write except Cancelled in our application.

By this trio-websocket change, we're masking the user from the effects of a trio anti-pattern, which is concerning. For example, the user may get away with except Cancelled when the trio-websocket API is involved, but then be doubly puzzled when it doesn't work with another API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I think this would be a very bad design choice. With this PR we're trying to make the internal nursery an implementation detail, as per https://trio.readthedocs.io/en/stable/reference-core.html#designing-for-multiple-errors
Before strict_exception_groups=True was the default it was expected that a trio developer would be able to handle the occasional MultiError, but since strict=True trio is trying very hard to make it fully consistent - either you always get an ExceptionGroup, or you never get one. And all other library APIs are encouraged to follow in those footsteps.

In the future the users shouldn't need to worry about this anti-pattern at all, so I do not think we should enshrine it in this API.

trio_websocket/_impl.py Outdated Show resolved Hide resolved
trio_websocket/_impl.py Outdated Show resolved Hide resolved
@jakkdl jakkdl requested a review from belm0 September 10, 2024 10:52
@belm0 belm0 merged commit f5fd6d7 into python-trio:master Sep 21, 2024
9 checks passed
@fossdd
Copy link

fossdd commented Sep 21, 2024

hmm, i still got with this patch, failures running the streamlink testsuite...:

=============================================== short test summary info ===============================================
FAILED tests/test_api_validate.py::TestParseQsdValidator::test_failure - assert "ValidationEr...t 'int' (123)" == "ValidationEr...decode' (123)"
FAILED tests/webbrowser/cdp/test_client.py::TestEvaluate::test_exception - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_client.py::TestEvaluate::test_error - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_client.py::TestNavigate::test_detach - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_client.py::TestNavigate::test_error - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestCreateConnection::test_failure - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestReaderError::test_invalid_json - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestReaderError::test_unknown_session_id - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestSend::test_timeout[Default timeout, response not in time] - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestSend::test_timeout[Custom timeout, response not in time] - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestSend::test_bad_command - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestSend::test_result_exception - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestHandleCmdResponse::test_response_error - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/cdp/test_connection.py::TestHandleCmdResponse::test_response_no_result - ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
FAILED tests/webbrowser/test_webbrowser.py::TestLaunch::test_terminate_on_nursery_baseexception - BaseExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
=============================== 15 failed, 6375 passed, 28 skipped, 1 warning in 14.93s ===============================

@bastimeyer
Copy link

bastimeyer commented Sep 21, 2024

Apologies for the off-topic comment, but since I'm the maintainer of Streamlink, I'll have to answer the previous comment.

failures running the streamlink testsuite

trio-websocket is not actually involved in any of the tests.

Streamlink's webbrowser tests fully mock/stub/monkeypatch trio_websocket.connect_websocket_url / WebSocketConnection, because the tests are only interested in message handling, not how the messages are sent or received.

Your test failures are caused by something else. Considering that other tests are failing as well, you should probably have a look at how you package trio. The Streamlink tests have been running just fine since the breaking changes of trio 0.25 were resolved. See GH actions running everything in a clean venv using the latest dependency versions.

tests/webbrowser/test_webbrowser.py::TestLaunch::test_terminate_on_nursery_baseexception

This test is part of an outdated version (changed in d5aa26f / 6.7.3).

tests/test_api_validate.py::TestParseQsdValidator::test_failure

@jakkdl jakkdl deleted the strict_exception_groups_support branch September 24, 2024 10:22
algitbot pushed a commit to alpinelinux/aports that referenced this pull request Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hide MultiErrors
4 participants