-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for cancellation of potentially long-running operations #86
Comments
This is pretty important to Polykey#846 as a lot of operations rely on the I won't say that this is blocking to the PR, but this would need to be addressed to properly implement cancellation and avoid open file descriptors blocking connections. |
Are you sure that EFS is what is taking the longest time? It's just a DB read of several blocks. And yes, if it is a large file it could take some time, but we are unlikely to have large files. Furthermore, this may cause some delay in in other parts of the application, but I do not think EFS is likely cause of the problem. Of course propagating the signal downwards is a good idea in general for any long running async operation, but it shouldn't be done without some comprehensive tests. |
I remember a few months ago we were trying to upload a fairly large PDF in the vault, but it timed out. What would happen in this timeout? Won't a timed out RPC leave the operation in kind of a danging state and cause resource leaks? Or is that handled some other way in the handler? Well, perhaps if i implement abortion after performing an efs operation, that might work on stopping the handler properly. Of course, when we add a context or a signal, then a comprehensive testing suite will be added to ensure this behaviour works as expected and doesn't break anything else in the process. |
Well that's why I was talking about timer for long running operations is reset upon successful response. So the problem with uploads is that client streaming which seems like the right RPC call, the response may be waiting for the client to finish. So I believe we consider it to be a per-message timeout. The timeout should reset whether a message is sent or received. This makes it a universal standard behaviour whether it is unary, client streaming, server streaming, duplex streaming. This means when you do a However on the other hand, if we want to preserve the original ctx meaning of that "this operation" is completed. Then the issue is that for many streaming contexts, it has to be infinite, and instead you might swap for an idle timeout. The reason for this is that the timeout is usually passed down to downstream operations on the handler side. So if the meaning changes from idle timeout to operational timeout, that can get very confusing. However it's also important that this is different from transport level idle timeout. The exact design isn't settled right now and I think we need to actually conclude on the proper design here: https://github.com/MatrixAI/MatrixAI-Graph/issues/43 Use chatgpt to help design a solution that makes sense, make suggestions. |
Specification
Currently, there is no way to cancel or abort an operation after it has been started. This is especially an issue for commands which can run for an arbitrarily long time. For example, the
efs.readFile()
can take extremely long to complete depending on the file.If this operation is being used in, say, a RPC call, and the RPC call times out, then the operation will continue to run in the background. This will keep a connection and a file descriptor open in the background and not do anything with it, resulting in leaked resources and blocking the entire application from shutting down.
The easiest way to achieve this is to accept either a context object from js-contexts, or just accept an
AbortController.signal
.Additional context
Tasks
The text was updated successfully, but these errors were encountered: