Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing general-purpose file list handler #785

Merged
merged 1 commit into from
Aug 26, 2024
Merged

Conversation

aryanjassal
Copy link
Member

@aryanjassal aryanjassal commented Aug 6, 2024

Description

Currently, Polykey can only return a list of secrets within a vault. This PR aims to implement an RPC handler which creates a general-purpose function able to output data similar to the UNIX command ls.

Issues Fixed

Tasks

  • 1. Write RPC handler for UNIX ls
  • 2. Remove deprecated list command for secrets
  • 3. Write tests for ls handler
  • 4. Write handler for streaming file contents will be completed in cat command
  • 5. Write tests for file streaming (cat) handler will be completed in cat command

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@aryanjassal aryanjassal changed the title Implementing general-purpose file list Implementing general-purpose file list handler Aug 6, 2024
@aryanjassal aryanjassal self-assigned this Aug 6, 2024
@aryanjassal aryanjassal marked this pull request as draft August 6, 2024 07:29
package.json Outdated Show resolved Hide resolved
src/client/handlers/VaultsSecretsList.ts Outdated Show resolved Hide resolved
src/client/handlers/VaultsSecretsList.ts Outdated Show resolved Hide resolved
src/client/handlers/VaultsSecretsList.ts Outdated Show resolved Hide resolved
@aryanjassal
Copy link
Member Author

aryanjassal commented Aug 9, 2024

I have encountered an issue in the implementation, where accessing the stream returns a read 0 error. This needs some time to fix, so I will need to have a chat with @tegefaulkes on Monday to work this out.

@CMCDragonkai
Copy link
Member

So in MatrixAI/Polykey-CLI#245, there's nowhere that the spec specifies the possibility of a getting file contents.

However this PR appears to create a raw stream that can also stream file contents.

I think file content streaming (if you really want to do a sort of archive streaming), should really be a different RPC call.

The ls RPC call should be simplified, and be similar to using a "scandir" in Python, or iterating over directory entries like in fs.opendir.

Basically that means this should work using a server streaming call, not a raw stream.

You can rename the raw stream handler to some other thing like archiveDir which then focuses on file contents.

I don't think it's always the right thing to have a single flexible god function that can do anything, instead having named functions that do specific things is often much clearer especially between program interfaces.

@tegefaulkes
Copy link
Contributor

The way I designed this was in two parts. First the globWalk utility that generated the file tree structure using a glob pattern. And secondly the serializerStreamFactory which takes the globWalk generator and serialises it into a binary stream. Stemming from that we'd have a single RPC handler that wraps that whole functionality of serialising the file tree and it's content.

But you're right, this could be split into two stages.

  1. server stream that wraps the globWalk for getting the file tree.
  2. The raw stream that serialises the contents of secrets over the raw stream.

The problem with this is we can run into race conditions across the multiple RPC streams since the locks would only apply to each stream separately. To solve this we'd need a RPC call for getting the lock of one or more vaults and allows us to use that lock for the follow up RPC calls. This lock can be held for the duration of the RPC call and any follow up RPC calls. It will need to handle timeouts for a call holding the lock for too long for whatever reason. I was thinking this was needed for some of the more complex unix commands such as cp and mv anyway.

This leaves us with a few small changes to how I wrote the utilities.

  1. Make a ServerHandler to wrap the functionality of globWalk.
  2. rewrite the serializerStreamFactory to only handle the content phase. so the tree phase needs to be cut out.
  3. Make a RawHandler for wrapping the new serializerStreamFactory functionality.
  4. Make a vault locking RPC call that will hold the lock for the requested vaults for the duration of the call. This may need to be a ClientHandler. It will need to grab the lock, return an array of numbers to reference the requested locks. These numbers need to be used when making follow up RPC calls that require locking such as the glob walk and the content serialiser.

@aryanjassal
Copy link
Member Author

  1. rewrite the serializerStreamFactory to only handle the content phase. so the tree phase needs to be cut out.
    Make a RawHandler for wrapping the new serializerStreamFactory functionality.
  2. Make a vault locking RPC call that will hold the lock for the requested vaults for the duration of the call. This may need to be a ClientHandler. It will need to grab the lock, return an array of numbers to reference the requested locks. These numbers need to be used when making follow up RPC calls that require locking such as the glob walk and the content serialiser.

I'm not sure how to do these tasks. We will need to have a discussion to help me with understanding and implementing this.

@aryanjassal
Copy link
Member Author

In src/vaults/fileTree.ts:343, I have modified the serializerStreamFactory to return a generator instead of returning a stream. This change was necessary to make it work with withVaultsG.

The only changes I made was to replace the controller.enqueue(data) to yield *data in the actual code to align with making it act as a generator instead of a stream. This, however, broke one of the tests, and I'm unable to understand why or how to fix it.

Summary of all failing tests
 FAIL  tests/vaults/fileTree.test.ts
  ● fileTree › serializer › handles invalid data (with seed=-646156050)

    Property failed after 20 tests
    { seed: -646156050, path: "19", endOnFailure: true }
    Counterexample: [Buffer.from([])]
    Shrunk 0 time(s)
    Got error: Should have thrown an error when parsing

      718 |         return;
      719 |       }
    > 720 |       throw Error('Should have thrown an error when parsing');
          |             ^
      721 |     });
      722 |     // TODO: tests for
      723 |     //  - empty files

      at Error (tests/vaults/fileTree.test.ts:720:13)
      at AsyncProperty.run (node_modules/fast-check/lib/check/property/AsyncProperty.generic.js:49:28)
      Hint: Enable verbose mode in order to have the list of all failing values encountered during the run
      at buildError (node_modules/fast-check/lib/check/runner/utils/RunDetailsFormatter.js:131:15)
      at asyncThrowIfFailed (node_modules/fast-check/lib/check/runner/utils/RunDetailsFormatter.js:148:11)

@aryanjassal aryanjassal reopened this Aug 13, 2024
@CMCDragonkai
Copy link
Member

The way I designed this was in two parts. First the globWalk utility that generated the file tree structure using a glob pattern. And secondly the serializerStreamFactory which takes the globWalk generator and serialises it into a binary stream. Stemming from that we'd have a single RPC handler that wraps that whole functionality of serialising the file tree and it's content.

But you're right, this could be split into two stages.

  1. server stream that wraps the globWalk for getting the file tree.
  2. The raw stream that serialises the contents of secrets over the raw stream.

The problem with this is we can run into race conditions across the multiple RPC streams since the locks would only apply to each stream separately. To solve this we'd need a RPC call for getting the lock of one or more vaults and allows us to use that lock for the follow up RPC calls. This lock can be held for the duration of the RPC call and any follow up RPC calls. It will need to handle timeouts for a call holding the lock for too long for whatever reason. I was thinking this was needed for some of the more complex unix commands such as cp and mv anyway.

This leaves us with a few small changes to how I wrote the utilities.

  1. Make a ServerHandler to wrap the functionality of globWalk.
  2. rewrite the serializerStreamFactory to only handle the content phase. so the tree phase needs to be cut out.
  3. Make a RawHandler for wrapping the new serializerStreamFactory functionality.
  4. Make a vault locking RPC call that will hold the lock for the requested vaults for the duration of the call. This may need to be a ClientHandler. It will need to grab the lock, return an array of numbers to reference the requested locks. These numbers need to be used when making follow up RPC calls that require locking such as the glob walk and the content serialiser.

You also have to keep in mind how the RPC may be used by third party client sdks in the future. So maintaining a simple IPC interface helps adoption. This is why I prefer having both complex RPC and simple RPC options being compostable.

@tegefaulkes
Copy link
Contributor

  1. rewrite the serializerStreamFactory to only handle the content phase. so the tree phase needs to be cut out.
    Make a RawHandler for wrapping the new serializerStreamFactory functionality.
  2. Make a vault locking RPC call that will hold the lock for the requested vaults for the duration of the call. This may need to be a ClientHandler. It will need to grab the lock, return an array of numbers to reference the requested locks. These numbers need to be used when making follow up RPC calls that require locking such as the glob walk and the content serialiser.

I'm not sure how to do these tasks. We will need to have a discussion to help me with understanding and implementing this.

For 1.

Right now the serializerStreamFactory will send over the tree data and then the content data if it's enabled with yieldContents set to true. You need to change it so it only takes TreeNode of type FILE for the input and yield the contents of these files. So you're just cutting out some of the functionality it has.

for 2.

Ignore that for now. Things will work without it, we just run into locking and race condition issues without it. I'll write up a new issue for it but we can proceed without it for now.

@tegefaulkes
Copy link
Contributor

In src/vaults/fileTree.ts:343, I have modified the serializerStreamFactory to return a generator instead of returning a stream. This change was necessary to make it work with withVaultsG.

The only changes I made was to replace the controller.enqueue(data) to yield *data in the actual code to align with making it act as a generator instead of a stream. This, however, broke one of the tests, and I'm unable to understand why or how to fix it.

Summary of all failing tests
 FAIL  tests/vaults/fileTree.test.ts
  ● fileTree › serializer › handles invalid data (with seed=-646156050)

    Property failed after 20 tests
    { seed: -646156050, path: "19", endOnFailure: true }
    Counterexample: [Buffer.from([])]
    Shrunk 0 time(s)
    Got error: Should have thrown an error when parsing

      718 |         return;
      719 |       }
    > 720 |       throw Error('Should have thrown an error when parsing');
          |             ^
      721 |     });
      722 |     // TODO: tests for
      723 |     //  - empty files

      at Error (tests/vaults/fileTree.test.ts:720:13)
      at AsyncProperty.run (node_modules/fast-check/lib/check/property/AsyncProperty.generic.js:49:28)
      Hint: Enable verbose mode in order to have the list of all failing values encountered during the run
      at buildError (node_modules/fast-check/lib/check/runner/utils/RunDetailsFormatter.js:131:15)
      at asyncThrowIfFailed (node_modules/fast-check/lib/check/runner/utils/RunDetailsFormatter.js:148:11)

I know what's wrong with it but it's a good object lesson in debugging problems so I'll let you work this one out. I can tell you that.

  1. The problem existed before your changes.
  2. It's a common oversight with fast-check and not considering an edge case.

@tegefaulkes
Copy link
Contributor

I made a new issue for tracking the vault locking RPC call at #786

@aryanjassal
Copy link
Member Author

Make a RawHandler for wrapping the new serializerStreamFactory functionality.

The specification of the file content streaming RPC call (which I have named VaultsSecretsGetFileContents) isn't very well defined. I'm not sure if the RawHandler would receive TreeNodes directly, or would it receive a pattern, which it would then use to match files and return their contents?

After changing the serializerStreamFactory, do I also make the changes in parserTransformStreamFactory, or leave it as-is, still supporting everything including TreeNode, ContentNode, and Uint8Array?

Most importantly, what is the use-case of this RPC handler? Is it to implement something like UNIX's cat functionality? If so, then this really belongs in a separate PR.

I would need clarification on all these points before I can begin implementation of the raw handler RPC call.

@tegefaulkes

@CMCDragonkai
Copy link
Member

Some of these function names seem really generic.

@aryanjassal
Copy link
Member Author

Most importantly, what is the use-case of this RPC handler? Is it to implement something like UNIX's cat functionality? If so, then this really belongs in a separate PR.

This new handler will indeed be for other UNIX commands and is not relevant for the ls command, so this will not be done in this PR.

After having a chat with @tegefaulkes, I was able to better understand the requirements of this RPC and how to achieve it.

The serializerStreamFactory will receive a generator of TreeNodes of files we want to be serialised. If the generator does not return a FILE, then the value will simply be consumed. Otherwise, those files' contents will be serialized. This way, we don't even need any special headers like TREE, etc.

Same thing in the parserTransformStreamFactory. In there, we can also remove most of the code handling the different headers, as we are only interested in the CONTENT headers.

On that note, the CONTENT header can be simplified further by removing the iNode field. This is because the files returned will be in the same order that they were sent in. Thus, we don't need to keep track of which file is being returned and in what order.

@aryanjassal aryanjassal force-pushed the feature-unix-ls branch 2 times, most recently from ca89c94 to f1eac5e Compare August 23, 2024 08:19
@aryanjassal
Copy link
Member Author

As per our discussion, I have removed all the extra features we didn't need, and trimmed it to the bare minimum an ls command should do: list secrets. Advanced functionality can be iterated over in the future, but this will get the command out there.

I am working on more advanced features like parity with GNU's ls command, or showing a longer, detailed view of the files and directories, etc. which will not be a part of this PR, but the code for that does exist, and those features might make the cut next iteration.

@aryanjassal aryanjassal marked this pull request as ready for review August 23, 2024 09:21
Copy link
Contributor

@tegefaulkes tegefaulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things need changing.

src/client/types.ts Outdated Show resolved Hide resolved
src/client/types.ts Outdated Show resolved Hide resolved
src/vaults/fileTree.ts Outdated Show resolved Hide resolved
tests/client/handlers/vaults.test.ts Outdated Show resolved Hide resolved
src/client/handlers/VaultsSecretsList.ts Outdated Show resolved Hide resolved
src/client/handlers/VaultsSecretsList.ts Outdated Show resolved Hide resolved
wip: working on adding unix-like ls for secrets

chore: added custom type and pattern matching

chore: updated caller for VaultsSecretsList

wip: working on passing filetree via rpc

feat: revamped the entire VaultsSecretsList handler

chore: updated handler name

chore: working on fixing read: 0 error

chore: renamed VaultsSecretsList to VaultsSecretsGetFileTree

fix: fixed problem with `withVaultsG` where `this` wasn't defined

[ci skip]

chore: switched from RawHandler to ServerCaller

chore: cleaned up code and started work on param parser

feat: added parsers for parameters

[ci skip]

chore: updated all tests referencing old serializerStreamFactory

feat: switched over the rpc handler from rawhandler to stremhandler

feat: stopped using globWalk for reading and listing files

chore: added error handling for invalid patterns

chore: added error handling for invalid patterns

chore: added symbolic link information to the file stats

chore: cleaned up irrelevant code

chore: fixed tests

chore: updated types

chore: added comments for stream ending prematurely

bug: working out why errors arent being caught by expect

chore: updated tests and added new kinds of errors

chore: added complement for 'ErrorSecretsIsDirectory'
@aryanjassal
Copy link
Member Author

Everything looks good. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants