-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor testnet integration testing #81
Comments
To simplify how the integration tests are run, We're removing all usage of env variables and switching in the tests. The command used for the integration tests will be coded into the test and not expected to he set by the CI job as an env variable. In the case of the docker tests the ci job is simplified to
The command used in the docker test will be hard coded as |
I'd suggest to keep the container name relative the same. Isn't it polykey-cli for the name.
As for the tag. Don't use camel case. Tags should be one word.
Also can't we pass the tag in to the tests via a jest parameter?
19 Dec 2023 12:13:59 Brian Botha ***@***.***>:
…
To simplify how the integration tests are run, We're removing all usage of env variables and switching in the tests. The command used for the integration tests will be coded into the test and not expected to he set by the CI job as an env variable.
In the case of the docker tests the ci job is simplified to
* nix-shell --arg ci true --run $'
image_and_tag="$(docker load --input ./builds/*docker* | cut -d\' \' -f3)";
docker tag "$image_and_tag" "polykey:testTarget";
npm run test tests/integration/docker;
*
The command used in the docker test will be hard coded as *docker run ${dockerOptions} polykey:testTarget*. So there will be no need for env variables. So long as a polykey image tagged as *polykey:testTarget* exists then the docker test can be run.
—
Reply to this email directly, view it on GitHub[#81 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAE4OHKI6OD7CGBRBQVTSQLYKDS3RAVCNFSM6AAAAABAYX7QQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRRHE2DCNJRGU].
You are receiving this because you are subscribed to this thread.
[Tracking image][https://github.com/notifications/beacon/AAE4OHJUXJOD5FSM3JYICQ3YKDS3RA5CNFSM6AAAAABAYX7QQ6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTO7L4QW.gif]
|
I'll change the name of the tagged image, but I don't know of a good way to pass the image name in when running the tests. I'm keeping that hard coded for now. |
I need to spec out how to handle checking if the seed nodes have updated. This can be done simply with two pieces of information.
We can compare the seed nodes to the new version and wait for all live nodes to match that version. We can take the current version from the config file to know 1. And we'll need to add an API endpoint to the Polykey dashboard for 2. The Polykey dashboard API needs an endpoint that returns all of the active nodes and their status. This will be similar to getting the status of a node via the CLI but we only want to present public information here. We can include the following information. Any suggestions are welcome. {
nodeId: NodeIdEncoded,
agentHost: Host,
agentPort: Port,
upTime: number,
connectionsActive: number, // Number of active connections
nodesTotal: number, // Number of seen connections in the node graph
versionCLI: string, // Version of the Polykey CLI code
versionPolykey: string, // Version of the Polykey lib code
versionState: number, // Version of the State schema
versionNetwork: number, // Version of the network API
} The dashboard will have to poll the seed nodes for this information and cache it. It won't be ideal for every request to trigger asking every seed node. To implement this API endpoint we need some way of asking seed nodes this information. I am unsure how we check seed nodes already. Is this done via the client RPC? Maybe we just need to expand the existing status RPC call to return the version information. @tegefaulkes will need to work with @amydevs to expand this spec. And maybe make a new issue for tracking this functionality specifically. |
i think @CMCDragonkai mentioned wanting to separate the version information to a separate endpoint from the basic information like remote info |
other than that, i think the properties are okay. |
That works too, only that it's a little more annoying to iterate over keys in a POJO. |
Parts 1, 2 can be tested but it needs to be merged first. I can defer part 3 to a new issue/PR as it depends on some changes external to this repo. I'm going to move forward with testing part 1 and 2. |
i think it would make sense for the versions to be exposed via both the agentStatus RPC call and a separate version RPC call. |
Things are mostly working with the docker integration tests now except for 1 problem. One of the tests is failing due to a CLI call being made to an agent failing. It's failing with...
Specifically the I'll do a little more digging but this isn't the priority right now. So I may have to shortcut the tests for now. I can see that the connections are being made and the agent isn't crashing so that'll have to be enough for now. |
Did you make sure you're using |
The CI pipeline has passed now. So points 1 and 2 are confirmed done. The docker integration tests are really basic right now with a really basic start in foreground and start with joining the There are some things that need to be looked into with the docker integration tests. Right now there is an issue with making client CLI call but starting an agent works fine. Since we're running |
Maybe this is why a diagnostics and audit domains is going to be important to provide that observability. Remember the struggles you're having right now is the same struggles that customers will have when they deploy PK into their containerized infrastructure. So it's important for you work through and have a clear understanding of "what the hell is going on" and all the failure modes here. |
Yes, it's using the default so it should be binding to the wildcard. I saw that external connections to the |
Don't rely on defaults when doing tests. You never know what the defaults may change to later in the future when we are updating the code. In tests, it's better to be explicitly configured than to rely on implicit defaults. Make sure it is in fact binding to the right places. |
I'm considering this done now. I've modified the deployment jobs to wait for the seed nodes to update. I have not removed the part that triggers the deployment since I haven't confirmed that the labda that handles this now is working properly. I don't think it is triggering properly but I don't know enough about how it works to debug it quickly and @amydevs has already left for Christmas. Everything still functions without it for now. But when it does work I need to modify the deployment job one last time. |
I think i know how to fix it, i'll do it later. I realised that the execution role of the lambda needed a policy that allowed it to execute itself. It's a bit unintuitive, but i think the reason is cos the eventrule assumes the execution role when executing the eventtarget lambda |
Is there a way to visualize this somehow? https://www.pulumi.com/blog/six-things-about-pulumi-service/ - I kind of want to know what happened to running our own DB, if we are using pulumi's state, what's the billing situation for it, is this a trial? I don't remember setting up a pulumi account for Matrix AI. I think in the new year it would be good for you to do a presentation explaining how the pulumi stack works in PKI. For @tegefaulkes and myself and others. Would be a good blog post. I want to compare it with https://www.winglang.io/ - lots of interesting lessons from Matrix OS - Architect language work that Matrix AI worked on before. |
I've fixed this. Lambda works as intended now. |
The lambda has apparently a different set of permissions? Did you configure it all into pulumi? |
Yep @CMCDragonkai |
Ok make sure you've documented it there too. |
Specification
We need to refactor and streamline testnet integration testing. There are a few core goals of this.
For All of the
testif
usage in the standard tests need to be removed. The integration tests should not just be repeats of certain standard tests. The integration tests may still use the testif utility to prevent some tests on certain platforms. This would need to be looked into.Integration tests need to be moved into a separate test directory to keep it isolated from the standard tests. This will allow us to easily run the standard and integration tests separately. The Integration tests should focus explicitly on two aspects.
testnet
.The ci jobs need to be updated as well. Currently we have a job that triggers the seed node tasks to restart with the latest image we deployed. This will be removed and handled by the infrastructure repo. This step should be replaced with a job that waits for the seed nodes to update before proceeding. We'll need an API endpoint on the polykey dashboard that exposes the seed node versions in some fashion to help with this. @tegefaulkes will need to work with @amydevs to spec this out and address it.
Waiting for the seed nodes to deploy and update is not essential here. Any breaking changes on the agent-agent RPC should be pretty rare. For now I'm going to address it last as things should function without it for now.
Additional context
testnet.polykey.com
#71Tasks
testif
utility from the standard tests.testnet
.testnet
to be updated. to this end the following changes are to be made.testnet
seed nodes using the new image will be removed. This will be handled by thetestnet
infrastructure.polykey
dashboard endpoint that will either return the when all seednodes have been updated, or list the versions of all the seed nodes. This will need to be speced out.The text was updated successfully, but these errors were encountered: