Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC timeout implementation #513

Merged
merged 25 commits into from
Apr 6, 2023
Merged

RPC timeout implementation #513

merged 25 commits into from
Apr 6, 2023

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Mar 14, 2023

Description

This PR handles adding the timeout features to the agnostic RPC system

Issues Fixed

Tasks

  • 1. Update RPCServer to support timeouts.
  • 2. Update RPCClient to support timeouts.
  • 3. Update client handlers to make use of the new cancellation/timeout ctx.
  • 5. Update agentHandlers to make use of the new cancellation/timeout ctx. This is blocked by Agent RPC migration #512
  • 6. Support communicating timeouts between RPCClient and RPCServer. support updating the timeouts based on these communicated values.
  • 7. Update WebSocketClient to take a ctx: ContextTime and allow it to abort the current connection.

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes tegefaulkes self-assigned this Mar 14, 2023
@tegefaulkes tegefaulkes linked an issue Mar 14, 2023 that may be closed by this pull request
@ghost
Copy link

ghost commented Mar 14, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@CMCDragonkai
Copy link
Member

When this is integrated into the handlers, does this need to be rebased on top of #509?

@CMCDragonkai
Copy link
Member

@tegefaulkes can you also flesh out the remaining tasks here, and what tests you have.

@tegefaulkes tegefaulkes force-pushed the feature-agnostic_timeout branch from 348f26f to 52d9186 Compare March 17, 2023 00:00
@tegefaulkes
Copy link
Contributor Author

Re-based on #509

@tegefaulkes tegefaulkes changed the base branch from staging to feature-client_migration March 17, 2023 00:19
@CMCDragonkai CMCDragonkai force-pushed the feature-client_migration branch from 6028a7d to 000b16a Compare March 22, 2023 22:14
@tegefaulkes tegefaulkes force-pushed the feature-client_migration branch 2 times, most recently from 043066d to 60e3d13 Compare March 24, 2023 03:34
@tegefaulkes
Copy link
Contributor Author

This needs to be rebased on staging after the recent merge. Doing that now.

@tegefaulkes tegefaulkes changed the base branch from feature-client_migration to staging March 27, 2023 01:53
* updated `@matrixai/timer` to `1.1.0`
* updated `@matrixai/async-cancellable` to `1.0.4`

[ci skip]
@tegefaulkes tegefaulkes force-pushed the feature-agnostic_timeout branch from 9675afc to 5a7c48c Compare March 27, 2023 03:32
@tegefaulkes
Copy link
Contributor Author

Cleaned up the commit history and re-based on staging. I'll check for any problems that the re-base may have introduced and then work on the RPCClient timeout support.

@tegefaulkes
Copy link
Contributor Author

RPCClient needs the following features to support timeouts.

  1. A default timeout value to be used when none is provided during a client call.
  2. Client calls need an optional ContextedTimed ctx. We should use the timedCancellable decorator for the caller methods.
  3. A timeout timer needs to be created for each call if none was provided.
  4. The timer needs to be refreshed when data is received or sent.
  5. The timer needs to be cleaned up when call ends or is cancelled.

@tegefaulkes
Copy link
Contributor Author

I can't use the TimedCancellable decorator for most of the caller methods. The decorator cleans up any AbortController or timer it creates when the method call has ended. The problem is, since we're returning streams most of the time. The context needs to exist for the duration of the streams and not just the method creating them.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Mar 27, 2023

It is possible to abort a webstream. I've been doing it with readable.pipeTo(writable, {signal}). This is the only way to cancel a stream if you don't control either side of it due to locking.

This comes with a small problem. It is possible to provide a reason for abortion. But this reason isn't being propagated through the stream. Aborting the stream this way results in the error AbortError: The operation was aborted for the stream. This is the same as the default DOMException error for abortion that we saw when fixing the promiseCancellable problem.

Ultimately this means our abort or timeout reason won't be thrown if the stream times out like this. Not through the streams at least.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 27, 2023

I can't use the TimedCancellable decorator for most of the caller methods. The decorator cleans up any AbortController or timer it creates when the method call has ended. The problem is, since we're returning streams most of the time. The context needs to exist for the duration of the streams and not just the method creating them.

You don't need to use the decorators. Use the HOF variant OR you can just construct the object yourself and maintain lifecycle somehow. Those were intended for convenience of thing it to the function scope context. But your timer object should be tied to the stream life cycle.

@CMCDragonkai
Copy link
Member

  1. Timeout property is optional, and should be managed like any other metadata param just like authentication. Which means middleware.
  2. Rather than using signal reason from toPipe, maintain a timer and abort controller attached to the lifecycle of every stream at the RPCClient and RPCServer level. Then you have more flexibility here, and can inject these control points to whatever needs it.
  3. It is possible that the streamFactory can receive an abort controller, or even a timer really. It's an async function can be aborted. Right now js-quic doesn't take this. But you could update your websocket transport layer to deal with it accordingly.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Mar 28, 2023

Few notes.

  1. It's possible to refresh the timer when sending or receiving a message. To do this we need to make use of a pass-through TransformStream. We should do this to match the watchdog style timeout of the server.
  2. In the absence of using a signal with a pipeTo, the caller has no way of ending a stream directly when it times out. This leaves us two options. The stream creator, this case the WebSocketServer or the quic system, needs to respond to an abort signal by cancelling and cleaning up the stream. The other alternative is to pass in a timer and refresh it directly or race it when interacting with the client

To emphasise point 2. the ctx: ContextTimed being passed into the stream factory is expected to do two things.

  1. Race the stream creation. The WebSocket domain does create a new connection for each stream. The quic system does not, if there is an active connection then a stream is instant with possibly some negotiation?
  2. Terminate a running stream. Say if a connection is lagging, the other side isn't responding but the connection is good, the connection is failed but the stream hasn't ended yet.

Point 2. is needed for the timeout mechanism to work properly. From the RPCClient's perspective a stream has 3 parts worth mentioning, the end that the user interacts with, the middle that the caller method can see and the end that the transport layer is interacting with.

Due to how stream locking works and composing, you can't really interact with a stream from the middle. And from the RPCClients perspective once we construct the transformation and pass either end to the user and transport layer, we have no control over that stream. So if we want to timeout and close the stream automatically, the only options are tell the transport layer to do it via the ctx. Or interact with the middle with the pipeTo which as mentioned before doesn't give clean results.

The only option left is to have the transport layer respect a signal and terminate the stream.

@tegefaulkes
Copy link
Contributor Author

When creating the RPCClient, at the time I opted not to include a map of active connections and the ability to await all of the/ force end them. At the time it didn't seem necessary due to the life cycle of a connection being tied closely to the life cycle of the the calling method or stream.

based on previous discussion I'll look into adding it in if only for debug usage.

@tegefaulkes
Copy link
Contributor Author

From the client caller method's perspective, there are two kinds of timers, one that was provided and a default timer.

If we provide a timer when making a call, I'm not sure we want to be refreshing when messages are sent. If we are composing multiple actions under a single timeout, we don't want an rpc call to mutate it. We may also want it to.

So the question is, do we want to enable refreshing implicitly based on if the timer was provided, or based on an option or both?

Right now I'm thinking, it will do refreshing if it's using the default timer, or not do refreshing if a timer was provided. This can be overridden with an option.

@tegefaulkes
Copy link
Contributor Author

Quick note, if a Timer is left running then it holds the process open. This is caused by the underlying timer from setTimeout() and setInterval(). They must be cancelled to prevent the process from being held open.

So if you create a Timer for a timeout when making an RPC call, it must be cancelled. Otherwise it's essentially a resource leak.

Timers might benefit from a withTimer context function.

@CMCDragonkai
Copy link
Member

Quick note, if a Timer is left running then it holds the process open. This is caused by the underlying timer from setTimeout() and setInterval(). They must be cancelled to prevent the process from being held open.

So if you create a Timer for a timeout when making an RPC call, it must be cancelled. Otherwise it's essentially a resource leak.

Timers might benefit from a withTimer context function.

Its the job of the caller when passing in the Timer to ensure deallocation when appropriate. So rpc call can't do anything. RPC call should close it if the timer was created by the rpc call and not externally. Basically ownership encompasses lifecycle.

@tegefaulkes
Copy link
Contributor Author

WebSocketClient needs to be updated to take a ctx: ContextTimed when for startConnection(). The timeout logic can also use improvements.

@CMCDragonkai
Copy link
Member

Ready for review?

Should we skip the agent migration or should just be merged?

I think after this you should go straight to QUIC so we can start on it.

For duplex and client streaming it is required that the user manually cleans up streams.

[ci skip]
[ci skip]
…dableWritablePair` and includes a `cancel(reason?: any)` method and generic metadata

- created `RPCStream` interface, updating code
- fixing tests
- client now includes metadata in remote errors
- `RPCServer` includes metadata in the `RPCStream` and propagates cancel/meta to middleware
- improved cancellation feature of the server `RPCStream`

[ci skip]
@tegefaulkes tegefaulkes force-pushed the feature-agnostic_timeout branch from 671cd10 to f5030d6 Compare April 6, 2023 03:50
@tegefaulkes
Copy link
Contributor Author

Fixed up and ready to merge.

@tegefaulkes tegefaulkes merged commit 67289cd into staging Apr 6, 2023
@CMCDragonkai
Copy link
Member

Nice!

@CMCDragonkai
Copy link
Member

Hey I think fix: test fixes should have a test scope in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants