Add support for cancellation of potentially long-running operations #86

aryanjassal · 2024-12-13T01:23:08Z

Specification

Currently, there is no way to cancel or abort an operation after it has been started. This is especially an issue for commands which can run for an arbitrarily long time. For example, the efs.readFile() can take extremely long to complete depending on the file.

If this operation is being used in, say, a RPC call, and the RPC call times out, then the operation will continue to run in the background. This will keep a connection and a file descriptor open in the background and not do anything with it, resulting in leaked resources and blocking the entire application from shutting down.

The easiest way to achieve this is to accept either a context object from js-contexts, or just accept an AbortController.signal.

Additional context

Polykey agent not shutting down gracefully Polykey-CLI#185

Tasks

Implement support for abortion or cancellation
Add tests to confirm this behaviour

The text was updated successfully, but these errors were encountered:

linear · 2024-12-13T01:23:11Z

ENG-489 Add support for cancellation of potentially long-running operations

aryanjassal · 2024-12-13T05:33:23Z

This is pretty important to Polykey#846 as a lot of operations rely on the efs. From cloning vaults to reading secrets. Some secrets can be like PDFs, which can be extremely large. As such, ability to cancel them is critical.

I won't say that this is blocking to the PR, but this would need to be addressed to properly implement cancellation and avoid open file descriptors blocking connections.

CMCDragonkai · 2024-12-15T20:50:16Z

Are you sure that EFS is what is taking the longest time? It's just a DB read of several blocks. And yes, if it is a large file it could take some time, but we are unlikely to have large files.

Furthermore, this may cause some delay in in other parts of the application, but I do not think EFS is likely cause of the problem.

Of course propagating the signal downwards is a good idea in general for any long running async operation, but it shouldn't be done without some comprehensive tests.

aryanjassal · 2024-12-16T01:16:24Z

I remember a few months ago we were trying to upload a fairly large PDF in the vault, but it timed out. What would happen in this timeout? Won't a timed out RPC leave the operation in kind of a danging state and cause resource leaks? Or is that handled some other way in the handler?

Well, perhaps if i implement abortion after performing an efs operation, that might work on stopping the handler properly. Of course, when we add a context or a signal, then a comprehensive testing suite will be added to ensure this behaviour works as expected and doesn't break anything else in the process.

CMCDragonkai · 2024-12-17T23:15:05Z

Well that's why I was talking about timer for long running operations is reset upon successful response.

So the problem with uploads is that client streaming which seems like the right RPC call, the response may be waiting for the client to finish.

So I believe we consider it to be a per-message timeout. The timeout should reset whether a message is sent or received.

This makes it a universal standard behaviour whether it is unary, client streaming, server streaming, duplex streaming.

This means when you do a client.methods.x({ timeout: ... }), that timeout has a different meaning than regular function calls.

However on the other hand, if we want to preserve the original ctx meaning of that "this operation" is completed. Then the issue is that for many streaming contexts, it has to be infinite, and instead you might swap for an idle timeout.

The reason for this is that the timeout is usually passed down to downstream operations on the handler side. So if the meaning changes from idle timeout to operational timeout, that can get very confusing.

However it's also important that this is different from transport level idle timeout.

The exact design isn't settled right now and I think we need to actually conclude on the proper design here: https://github.com/MatrixAI/MatrixAI-Graph/issues/43

Use chatgpt to help design a solution that makes sense, make suggestions.

aryanjassal added the development Standard development label Dec 13, 2024

aryanjassal mentioned this issue Dec 13, 2024

Implement GitHub CI for js-encryptedfs #87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for cancellation of potentially long-running operations #86

Add support for cancellation of potentially long-running operations #86

aryanjassal commented Dec 13, 2024

linear bot commented Dec 13, 2024

aryanjassal commented Dec 13, 2024

CMCDragonkai commented Dec 15, 2024

aryanjassal commented Dec 16, 2024

CMCDragonkai commented Dec 17, 2024

Add support for cancellation of potentially long-running operations #86

Add support for cancellation of potentially long-running operations #86

Comments

aryanjassal commented Dec 13, 2024

Specification

Additional context

Tasks

linear bot commented Dec 13, 2024

aryanjassal commented Dec 13, 2024

CMCDragonkai commented Dec 15, 2024

aryanjassal commented Dec 16, 2024

CMCDragonkai commented Dec 17, 2024