Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing for Testnet Deployment #378

Merged
merged 39 commits into from
Jun 14, 2022
Merged

Preparing for Testnet Deployment #378

merged 39 commits into from
Jun 14, 2022

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Jun 1, 2022

Description

Note that this has been migrated from #326.

Once #310 has been merged, we'll finally be able to move onto the deployment of our testnet into AWS.

I foresee this to be achievable in a few stages:

  1. Deploy Polykey onto AWS and ensure we have a working remote keynode (Testnet Node Deployment (testnet.polykey.io) #194)
  2. Create the testnet seed nodes and store these securely somewhere (Testnet securely maintain a pool of recovery codes #285)
  3. Use these seed nodes for our domain name (Update testnet.polykey.io to point to the list of IPs running seed keynodes #177)

Issues Fixed

Testnet deployment:

NodeGraph Structure:

Node Adding Policies:

Node Removal Policies:

Tasks

  • [ ] 2. Complete Support host (IP address) in parseSeedNodes #324 - being completed in Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310
  • 1. Read through and revisit all previous issues
  • [ ] 3. Complete Testnet Node Deployment (testnet.polykey.io) #194 - Extracted out, should be finished after this PR merges
    • Ultimately provides us with a deployment of Polykey on AWS!
    • Task list there should be sufficient to follow process.
  • [ ] 4. Complete Testnet securely maintain a pool of recovery codes #285 - Extracted out, should be finished after this PR merges
  • [ ] 5. Complete Update testnet.polykey.io to point to the list of IPs running seed keynodes #177 - Extracted out, should be finished after this PR merges
  • [ ] 6. Created automated testing that utilises the testnet - to be done in Tests for NAT-Traversal and Hole-Punching #357
  • 7. Add agent service nodes tests Testnet Node Deployment (testnet.polykey.io) #194 (comment)
    • - nodesChainDataGet
    • - nodesClosestLocalNode
    • - nodesHolePunchMessage
  • 8. Changed Cmd to EntryPoint because PK only has 1 executable and makes docker run ... easier.
  • 9. The --seed-nodes='<defaults>;...' will now mean that any specified seed nodes overrides the defaults rather than the other way around
  • 10. If seed nodes contains the own NodeID, the own NodeId will be automatically filtered.
  • 11. Make pk agent start return status information such as nodeId and not just recoveryCode
  • 12. All PK commands can contact a remote agent (except agent start and the like), it needs to be made more obvious whether we are contacting the local agent or remote agent Testnet Deployment #326 (comment)
    • This is done by making the required client connection parameters of node ID, client host and client port all or nothing when set. This means if they are partially set, this results in a new exception ErrorCLIClientOptions.
    • If they are all set, that means a connection to an agent not specified by the local status.
    • If none are set, this means relying on the local status.
  • 13. The seed node should be adding the details of any node connecting to it to its own NodeGraph: see issue Seed node not adding details of connecting node to its NodeGraph #344
    • currently the seed node does not add an entry in its NodeGraph for a connecting node
    • this is essential for the Kademlia implementation to start to function, and for the node's details to eventually propagate out into the wider Polykey network
    • see Testnet Deployment #326 (comment)
  • 14. Investigate and fix encountered Polykey errors/problems during testnet testing Testnet Deployment #326 (comment):
  • 15. Refactor NodeGraph structure
    • The new structure will use 3 nested sublevels to achieve Support read-after-write consistency for NodeGraph bucket operations #244
      • buckets sublevel will contain each bucket, where each bucket sublevel contains NodeId to NodeData
      • meta sublevel contains each bucket, where each bucket sublevel is a config structure, currently only count
      • lastUpdated or index sublevel that contains each bucket, and each bucket sublevel contains lexi(lastUpdated)-NodeId to NodeId, the key is a compound index, allowing us to efficiently acquire the most up-to-date or least up-to-date node entry
    • New methods like getNodes and getBuckets will allow us to interrogate the state of the NodeGraph efficiently, by streaming data out of the NodeGraph, this will be important for debugging, and later analytics
    • Integration of DBTransaction and new iterator feature of sublevels that allow us to maintain a snapshot of the DB at a point in time
    • Reviewing kademlia trie structure and bringing ideas in - reviewed and discovered that trie structure properties are not necessary in our NodeGraph, our design is already efficient and optimal
  • 16. Optimise getClosestNode method based on new NodeGraph structure
  • 17. Relationship between NodeGraph and NodeManager should be tightly coupled due to the adding and removal policies of nodes, maintaining the bucket limit must involving pinging old nodes
  • 18. tests for changing root keys and connections.
    • Check how active connections work during a keychange.
    • Check if new connections use new TLS config
    • check if a renew/reset key event propigates to where it is needed.
  • 19. cleaning up and small fixes to nodesUtils parser functions
  • 20. adding force and ping flags to nodes add command

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes
Copy link
Contributor Author

This was migrated from #326 since we needed to change to a feature branch. Work from this PR/branch from now on.

@CMCDragonkai
Copy link
Member

@tegefaulkes please remember to tick off anything above in case they are done, or cross out things that are no longer relevant in this PR.

@CMCDragonkai
Copy link
Member

Change the base branch to target #374 branch.

@tegefaulkes tegefaulkes changed the base branch from master to feature-dep-upgrades June 1, 2022 05:02
@tegefaulkes
Copy link
Contributor Author

Starting rebase on feature-dep-upgrades.

@tegefaulkes tegefaulkes force-pushed the feature-testnet_prep branch from 0777647 to d28b9e0 Compare June 1, 2022 07:27
@tegefaulkes
Copy link
Contributor Author

I've done the first blush rebase but there are some parts that I have to fix. Most of it is just the NodeGraph since It had a major refactor in this PR. So I need to go over it and apply the locking and DB changes. On top of that there are likely a bunch of bugs that were introduced that I have to review and fix.

@tegefaulkes tegefaulkes force-pushed the feature-testnet_prep branch from d28b9e0 to b7a48ee Compare June 2, 2022 06:43
@tegefaulkes
Copy link
Contributor Author

Finished with the re-base + initial fixes. Everything is building and linting fine now. I'm going to start fixing the tests.

src/utils/utils.ts Outdated Show resolved Hide resolved
src/nodes/utils.ts Outdated Show resolved Hide resolved
@tegefaulkes
Copy link
Contributor Author

I can't get the iterator to work properly with the gt or lt options. I'm not sure if I'm using it wrong or if it's broken. Looking at the js-db tests we don't seem to be testing if they work so it might be broken.

@emmacasolin emmacasolin force-pushed the feature-dep-upgrades branch from 9f4bda2 to 92a96dd Compare June 3, 2022 05:55
@tegefaulkes tegefaulkes force-pushed the feature-testnet_prep branch from 39c3c6e to aa29fa9 Compare June 6, 2022 06:21
@emmacasolin emmacasolin force-pushed the feature-dep-upgrades branch from 489599c to 8c80f02 Compare June 6, 2022 06:48
@tegefaulkes tegefaulkes force-pushed the feature-testnet_prep branch from d45d952 to 3a9f071 Compare June 6, 2022 10:40
@CMCDragonkai CMCDragonkai mentioned this pull request Jun 7, 2022
10 tasks
@emmacasolin emmacasolin force-pushed the feature-dep-upgrades branch from 8c80f02 to c37eddb Compare June 7, 2022 01:43
@CMCDragonkai
Copy link
Member

This PR will need to target staging and rebase on top. However @emmacasolin is still pending some changes to the CI/CD so we can hold off on that for now. There are changes from #374 that this should rebase on, because of all the fixes to logging.

@CMCDragonkai
Copy link
Member

@tegefaulkes is this PR's description updates to point to abort controller changes?

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 7, 2022

Also please clean up the task list, and start adding ETAs into it as well.

What is the final task list?

@tegefaulkes tegefaulkes force-pushed the feature-testnet_prep branch from b8c5d39 to 1246371 Compare June 7, 2022 04:32
@tegefaulkes tegefaulkes changed the base branch from feature-dep-upgrades to staging June 7, 2022 04:32
@tegefaulkes
Copy link
Contributor Author

Re-based and targeting staging now.

tegefaulkes and others added 20 commits June 14, 2022 12:05
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
`NodeManager.setNode` and `NodeConnectionManager.syncNodeGraph` now utilise a single, shared queue to asynchronously add nodes to the node graph without blocking the main loop. These methods are both blocking by default but can be made non-blocking by setting the `block` parameter to false.
#322
Renamed `queueStart` and `queuePush` since `Queue` is its own class now
Simplified some logic using the `promise` utility
This contains fixes for failing tests as well as fixes for tests failing to exit when finished.
This checks if we await things that are not promises. This is not a problem per se, but we generally don't want to await random things.
…des when entering the network

This tests for if the Seed node contains the new nodes when they are created. It also checks if the new nodes discover each other after being created.

Includes a change to `findNode`. It will no longer throw an error when failing to find the node. This will have to be thrown by the caller now. This was required by `refreshBucket` since it's very likely that we can't find the random node it is looking for.
DB and type changes using new transactions
Linting
Fixing timeouts
Exports moved to end of the file as part of the `export {}` block
Cleaned up TODOs and FIXMEs
Fixed index and other incorrect imports
Type fixes in utils
Abort controller functionality is included in node now.
Need to ensure validity of nodes by pinging them before adding them to the node graph.

#322
Added some tests to check that a root keyPair change propagates properly. Also added tests for the change for existing and new node connections.

#317
@emmacasolin emmacasolin force-pushed the feature-testnet_prep branch from e3011da to 7253de8 Compare June 14, 2022 02:17
@emmacasolin
Copy link
Contributor

The wip commits have now been squashed and renamed. All of the tests should still be passing, in which case this can be merged into staging and the CI commits moved on top of this. There shouldn't be anything here that relies on those changes.

@CMCDragonkai CMCDragonkai marked this pull request as ready for review June 14, 2022 02:52
@CMCDragonkai
Copy link
Member

Ok merging into staging, afterwards @emmacasolin you can rebase and move the CI commits above this merge commit.

@CMCDragonkai
Copy link
Member

Many of the issues didn't autoclose because PK's default branch is still master. I've closed relevant issues manually, but we will switch to staging as the default branch once the CI/CD is all fixed up in the staging branch.

@CMCDragonkai
Copy link
Member

Some remaining issues that come out of this PR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment