-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests(pd): automated migration testing #4337
Conversation
c2d509c
to
5ab68dc
Compare
bd217fc
to
05b93f0
Compare
05b93f0
to
06517b9
Compare
06517b9
to
d78f2a5
Compare
d78f2a5
to
7d5fdcd
Compare
7d5fdcd
to
a57489b
Compare
a57489b
to
942fec9
Compare
942fec9
to
dc0ee03
Compare
dc0ee03
to
5b08436
Compare
5b08436
to
7c89436
Compare
d6f2d06
to
ad5476f
Compare
ad5476f
to
9794e24
Compare
9794e24
to
bd0c544
Compare
bd0c544
to
542158b
Compare
542158b
to
44d47b4
Compare
Marking as ready for review, so I can stop rebasing it. 😅 The immediate consequence of merging this PR is that CI runs will take >20m again, but I'm happy to dial that back and rely on the cron schedule. For now, though, having it run on every PR will give us confidence about the state of migrations as we push toward #4497. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm going to somewhat cautiously hit the "approve" button here. thanks so much for your work to give us automated coverage of migrations. that's tremendously valuable work.
we should have automated migration testing, and i want that sooner than later, but.. i'm a little nervous about how these are shaped. they feel somewhat heavy on boilerplate yaml, and there's a lot of shell glue surrounding the actual test.
i think a lot of the logic here could also be (and is, in the case of delegating) covered by mock consensus tests, but the broader vision of having integration tests that run a real CometBFT node feel topically relevant for migrations.
i've left a smattering of comments below. i'm happy to see this merge before those things are addressed, being acutely aware that this life easier while managing upgrades ❤️ ...but maybe file a tracking ticket(s) for following up on these points.
# By default, don't enable migration-tests: require explicit opt-in via | ||
# `--features migration-test`. | ||
default = [] | ||
migration-test = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
fn main() { | ||
println!("Hello, world!"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn main() { | |
println!("Hello, world!"); | |
} | |
//! This crate is used to group migration tests, this binary does not do anything. | |
fn main() { | |
println!("Hello, world!"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this empty file was a little confusing for a moment, so a comment might help signal that this main.rs
is only here as a stub.
@@ -0,0 +1,35 @@ | |||
[package] | |||
name = "migration-test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name = "migration-test" | |
name = "penumbra-migration-test" |
a nit, but this should probably have the same penumbra-
prefix as other crates in the workspace.
# By default, build pd from the workspace. Support overriding via a deeper git-worktree, | ||
# so that an older version of pd can be built and run. This helps when running older | ||
# networks locally, to debug migrations. | ||
vars: | ||
WORKING_DIR: . | ||
# WORKING_DIR: deployments/worktrees/v0.73.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i appreciate these thoughtful breadcrumbs for future tinkering. nice 🪙
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit difficult to understand what this is testing. my understanding is that this is running the smoke tests on the v0.76.0 release? i don't think i am connecting the dots to how this is exercising migration logic.
it'd be helpful to (a) name these files more commucatively (possibly via putting them in a folder together), and (b) putting a comment at the top of the file, in place of the boilerplate there right now. this isn't a configuration for running migration tests, it is a migration test, aiui.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...as i've read this more, i see these are phases. i'll mark this as resolved, but i did find coming to this understanding to be difficult. 😓
|
||
/// TOML for an "upgrade-plan" governance proposal. | ||
// Intentionally avoiding importing this type to adhere to strict | ||
// CLI interfacts for the pcli binaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CLI interfacts for the pcli binaries. | |
// CLI interfacts for the pcli binaries. |
i think a sentence got garbled here!
height: u64, | ||
} | ||
|
||
#[cfg_attr(not(feature = "migration-test"), ignore)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cfg_attr(not(feature = "migration-test"), ignore)] | |
#[cfg(feature = "migration-test")] |
ditto, another conditional compilation attribute we can simplify
[dependencies] | ||
anyhow = {workspace = true} | ||
directories = {workspace = true} | ||
once_cell = {workspace = true} | ||
penumbra-keys = {workspace = true, default-features = false} | ||
serde = {workspace = true, features = ["derive"]} | ||
serde_json = {workspace = true} | ||
serde_with = {workspace = true, features = ["hex"]} | ||
toml = {workspace = true, features = ["preserve_order"]} | ||
tracing = {workspace = true} | ||
tracing-subscriber = {workspace = true, features = ["env-filter", "ansi"]} | ||
|
||
[dev-dependencies] | ||
assert_cmd = {workspace = true} | ||
predicates = "2.1" | ||
regex = {workspace = true} | ||
tempfile = {workspace = true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[dependencies] | |
anyhow = {workspace = true} | |
directories = {workspace = true} | |
once_cell = {workspace = true} | |
penumbra-keys = {workspace = true, default-features = false} | |
serde = {workspace = true, features = ["derive"]} | |
serde_json = {workspace = true} | |
serde_with = {workspace = true, features = ["hex"]} | |
toml = {workspace = true, features = ["preserve_order"]} | |
tracing = {workspace = true} | |
tracing-subscriber = {workspace = true, features = ["env-filter", "ansi"]} | |
[dev-dependencies] | |
assert_cmd = {workspace = true} | |
predicates = "2.1" | |
regex = {workspace = true} | |
tempfile = {workspace = true} | |
[dev-dependencies] | |
anyhow = { workspace = true } | |
assert_cmd = { workspace = true } | |
directories = { workspace = true } | |
once_cell = { workspace = true } | |
penumbra-keys = { workspace = true, default-features = false } | |
predicates = "2.1" | |
regex = { workspace = true } | |
serde = { workspace = true, features = ["derive"] } | |
serde_json = { workspace = true } | |
serde_with = { workspace = true, features = ["hex"] } | |
tempfile = { workspace = true } | |
toml = { workspace = true, features = ["preserve_order"] } | |
tracing = { workspace = true } | |
tracing-subscriber = { workspace = true, features = ["env-filter", "ansi"] } |
these should all be dev-dependencies
, since this crate only contains tests.
otherwise, cargo build --package migration-test
will mean building all of these crates, despite the main.rs
being empty and the --features migration-test
not being provided
44d47b4
to
eb3c0e3
Compare
Adds intra-CI standalone testing of migration behavior. This can be run locally on developer workstations, and also in CI. The job is currently taking >20m, and requires a lot of disk space, but it's worth it for the assurance. Building on the smoke-test rewrite to use process-compose, let's script the migration process, so that we can test current HEAD of the monorepo against a prior tagged version, and validate that necessary migrations are in place. One possible approach is to fetch prebuilt binaries from uploaded artifacts on Github. That's fine for `pd`, but doesn't work for running the smoke tests, due to client/server incompatibility. Therefore we'll clone the entire repo in a git-ignored subdir, and build the old binaries there. Heavy, but reliable. Adds a new rust crate, strictly for running the migration-test suite of integration tests, which is very similar in nature to the already-existing network-integration tests AKA smoke tests. Copy/pastes a lot of code from the smokes, which we can always factor out into reusable utils, but not bothering with that right now. Refs #4323.
eb3c0e3
to
af07e4f
Compare
Downgrading to draft until I address the review comments. Overall, the shape of this approach is rather heavy. Will continue to discuss testing options with @cratelyn, so that we maximize what's possible with mock-consensus, but still retain integration tests for migration behavior. Notably, this PR does not catch bugs in restarting a multi-validator setup post-upgrade, e.g. #4508. |
This draft work successfully caught several early-stage issues with migrations, but only of the simpler variety: when migrations fail to apply to halted state, for instance. More complex situations such as the network failing to come back up in multi-validator setups due to app logic problems were not caught by these changes, and instead required a more complicated setup that leveraged an online multi-validator setup. Given the CI changes in #4678, we should think hard about what an adequate testing environment looks like for this environment. For the near future, I recommend we stick to creating devnets and performing candidate migrations against them, ensuring that they behave well when restarted across the upgrade boundary. Longer-term, there's detail in #4323 and also #4265 for how to approach this problem comprehensively. |
Describe your changes
Adds intra-CI standalone testing of migration behavior. This can be run locally on developer workstations, and also in CI. The job is currently taking >20m, and requires a lot of disk space, but it's worth it for the assurance.
Building on the smoke-test rewrite to use process-compose, let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version, and validate that necessary migrations are in place.
One possible approach is to fetch prebuilt binaries from uploaded artifacts on Github. That's fine for
pd
, but doesn't work for running the smoke tests, due to client/server incompatibility. Therefore we'll clone the entire repo in a git-ignored subdir, and build the old binaries there. Heavy, but reliable.Adds a new rust crate, strictly for running the migration-test suite of integration tests, which is very similar in nature to the already-existing network-integration tests AKA smoke tests. Copy/pastes a lot of code from the smokes, which we can always factor out into reusable utils, but not bothering with that right now.
Issue ticket number and link
Refs #4323.
Checklist before requesting a review
If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason:
Review, running locally
Check out this branch, and run
just migration-test
. It'll take a while to build! The test as written duplicates the local repo, so plan on ~100GB of disk space utilization.