tests(pd): automated migration testing #4337

conorsch · 2024-05-06T23:53:43Z

Describe your changes

Adds intra-CI standalone testing of migration behavior. This can be run locally on developer workstations, and also in CI. The job is currently taking >20m, and requires a lot of disk space, but it's worth it for the assurance.

Building on the smoke-test rewrite to use process-compose, let's script the migration process, so that we can test
current HEAD of the monorepo against a prior tagged version, and validate that necessary migrations are in place.

One possible approach is to fetch prebuilt binaries from uploaded artifacts on Github. That's fine for pd, but doesn't work for running the smoke tests, due to client/server incompatibility. Therefore we'll clone the entire repo in a git-ignored subdir, and build the old binaries there. Heavy, but reliable.

Adds a new rust crate, strictly for running the migration-test suite of integration tests, which is very similar in nature to the already-existing network-integration tests AKA smoke tests. Copy/pastes a lot of code from the smokes, which we can always factor out into reusable utils, but not bothering with that right now.

Issue ticket number and link

Refs #4323.

Checklist before requesting a review

If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason:

Testing-only, no changes to application logic.

Review, running locally

Check out this branch, and run just migration-test. It'll take a while to build! The test as written duplicates the local repo, so plan on ~100GB of disk space utilization.

hdevalence · 2024-05-07T03:53:50Z

Confoundingly, I'm getting a proto incompatible error, about missing AuctionParams.

I think this was an actual bug #4338. We shouldn't assume the migrations actually work until we do the testing. See also #4340

conorsch · 2024-05-29T20:45:49Z

Marking as ready for review, so I can stop rebasing it. 😅 The immediate consequence of merging this PR is that CI runs will take >20m again, but I'm happy to dial that back and rely on the cron schedule. For now, though, having it run on every PR will give us confidence about the state of migrations as we push toward #4497.

cratelyn

i'm going to somewhat cautiously hit the "approve" button here. thanks so much for your work to give us automated coverage of migrations. that's tremendously valuable work.

we should have automated migration testing, and i want that sooner than later, but.. i'm a little nervous about how these are shaped. they feel somewhat heavy on boilerplate yaml, and there's a lot of shell glue surrounding the actual test.

i think a lot of the logic here could also be (and is, in the case of delegating) covered by mock consensus tests, but the broader vision of having integration tests that run a real CometBFT node feel topically relevant for migrations.

i've left a smattering of comments below. i'm happy to see this merge before those things are addressed, being acutely aware that this life easier while managing upgrades ❤️ ...but maybe file a tracking ticket(s) for following up on these points.

cratelyn · 2024-05-30T04:37:37Z

crates/test/migration-test/Cargo.toml

+# By default, don't enable migration-tests: require explicit opt-in via
+# `--features migration-test`.
+default = []
+migration-test = []


cratelyn · 2024-05-30T04:41:07Z

crates/test/migration-test/src/main.rs

+fn main() {
+    println!("Hello, world!");
+}


Suggested change

fn main() {

println!("Hello, world!");

}

//! This crate is used to group migration tests, this binary does not do anything.

fn main() {

println!("Hello, world!");

}

this empty file was a little confusing for a moment, so a comment might help signal that this main.rs is only here as a stub.

cratelyn · 2024-05-30T04:53:05Z

crates/test/migration-test/Cargo.toml

@@ -0,0 +1,35 @@
+[package]
+name = "migration-test"


Suggested change

name = "migration-test"

name = "penumbra-migration-test"

a nit, but this should probably have the same penumbra- prefix as other crates in the workspace.

cratelyn · 2024-05-30T04:55:43Z

deployments/compose/process-compose.yml

+# By default, build pd from the workspace. Support overriding via a deeper git-worktree,
+# so that an older version of pd can be built and run. This helps when running older
+# networks locally, to debug migrations.
+vars:
+  WORKING_DIR: .
+  # WORKING_DIR: deployments/worktrees/v0.73.1


i appreciate these thoughtful breadcrumbs for future tinkering. nice 🪙

cratelyn · 2024-05-30T05:02:36Z

deployments/compose/process-compose-migration-test-1.yml

it's a bit difficult to understand what this is testing. my understanding is that this is running the smoke tests on the v0.76.0 release? i don't think i am connecting the dots to how this is exercising migration logic.

it'd be helpful to (a) name these files more commucatively (possibly via putting them in a folder together), and (b) putting a comment at the top of the file, in place of the boilerplate there right now. this isn't a configuration for running migration tests, ~~it is a migration test, aiui.~~

...as i've read this more, i see these are phases. i'll mark this as resolved, but i did find coming to this understanding to be difficult. 😓

cratelyn · 2024-05-30T05:30:27Z

crates/test/migration-test/tests/network_integration.rs

+
+/// TOML for an "upgrade-plan" governance proposal.
+// Intentionally avoiding importing this type to adhere to strict
+// CLI interfacts for the pcli binaries.


Suggested change

// CLI interfacts for the pcli binaries.

// CLI interfacts for the pcli binaries.

i think a sentence got garbled here!

cratelyn · 2024-05-30T05:31:09Z

crates/test/migration-test/tests/network_integration.rs

+    height: u64,
+}
+
+#[cfg_attr(not(feature = "migration-test"), ignore)]


Suggested change

#[cfg_attr(not(feature = "migration-test"), ignore)]

#[cfg(feature = "migration-test")]

ditto, another conditional compilation attribute we can simplify

deployments/scripts/migration-test

deployments/scripts/generate-config-for-bootstrap-local-node

cratelyn · 2024-05-30T05:43:37Z

crates/test/migration-test/Cargo.toml

+[dependencies]
+anyhow = {workspace = true}
+directories = {workspace = true}
+once_cell = {workspace = true}
+penumbra-keys = {workspace = true, default-features = false}
+serde = {workspace = true, features = ["derive"]}
+serde_json = {workspace = true}
+serde_with = {workspace = true, features = ["hex"]}
+toml = {workspace = true, features = ["preserve_order"]}
+tracing = {workspace = true}
+tracing-subscriber = {workspace = true, features = ["env-filter", "ansi"]}
+
+[dev-dependencies]
+assert_cmd = {workspace = true}
+predicates = "2.1"
+regex = {workspace = true}
+tempfile = {workspace = true}


Suggested change

[dependencies]

anyhow = {workspace = true}

directories = {workspace = true}

once_cell = {workspace = true}

penumbra-keys = {workspace = true, default-features = false}

serde = {workspace = true, features = ["derive"]}

serde_json = {workspace = true}

serde_with = {workspace = true, features = ["hex"]}

toml = {workspace = true, features = ["preserve_order"]}

tracing = {workspace = true}

tracing-subscriber = {workspace = true, features = ["env-filter", "ansi"]}

[dev-dependencies]

assert_cmd = {workspace = true}

predicates = "2.1"

regex = {workspace = true}

tempfile = {workspace = true}

[dev-dependencies]

anyhow = { workspace = true }

assert_cmd = { workspace = true }

directories = { workspace = true }

once_cell = { workspace = true }

penumbra-keys = { workspace = true, default-features = false }

predicates = "2.1"

regex = { workspace = true }

serde = { workspace = true, features = ["derive"] }

serde_json = { workspace = true }

serde_with = { workspace = true, features = ["hex"] }

tempfile = { workspace = true }

toml = { workspace = true, features = ["preserve_order"] }

tracing = { workspace = true }

tracing-subscriber = { workspace = true, features = ["env-filter", "ansi"] }

these should all be dev-dependencies, since this crate only contains tests.

otherwise, cargo build --package migration-test will mean building all of these crates, despite the main.rs being empty and the --features migration-test not being provided

Adds intra-CI standalone testing of migration behavior. This can be run locally on developer workstations, and also in CI. The job is currently taking >20m, and requires a lot of disk space, but it's worth it for the assurance. Building on the smoke-test rewrite to use process-compose, let's script the migration process, so that we can test current HEAD of the monorepo against a prior tagged version, and validate that necessary migrations are in place. One possible approach is to fetch prebuilt binaries from uploaded artifacts on Github. That's fine for `pd`, but doesn't work for running the smoke tests, due to client/server incompatibility. Therefore we'll clone the entire repo in a git-ignored subdir, and build the old binaries there. Heavy, but reliable. Adds a new rust crate, strictly for running the migration-test suite of integration tests, which is very similar in nature to the already-existing network-integration tests AKA smoke tests. Copy/pastes a lot of code from the smokes, which we can always factor out into reusable utils, but not bothering with that right now. Refs #4323.

conorsch · 2024-06-03T15:40:48Z

Downgrading to draft until I address the review comments. Overall, the shape of this approach is rather heavy. Will continue to discuss testing options with @cratelyn, so that we maximize what's possible with mock-consensus, but still retain integration tests for migration behavior. Notably, this PR does not catch bugs in restarting a multi-validator setup post-upgrade, e.g. #4508.

conorsch · 2024-06-28T21:48:26Z

This draft work successfully caught several early-stage issues with migrations, but only of the simpler variety: when migrations fail to apply to halted state, for instance. More complex situations such as the network failing to come back up in multi-validator setups due to app logic problems were not caught by these changes, and instead required a more complicated setup that leveraged an online multi-validator setup.

Given the CI changes in #4678, we should think hard about what an adequate testing environment looks like for this environment. For the near future, I recommend we stick to creating devnets and performing candidate migrations against them, ensuring that they behave well when restarted across the upgrade boundary. Longer-term, there's detail in #4323 and also #4265 for how to approach this problem comprehensively.

conorsch temporarily deployed to smoke-test May 6, 2024 23:53 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from c2d509c to 5ab68dc Compare May 7, 2024 16:18

conorsch had a problem deploying to smoke-test May 7, 2024 16:18 — with GitHub Actions Error

conorsch had a problem deploying to smoke-test May 7, 2024 16:23 — with GitHub Actions Error

conorsch force-pushed the migration-testing branch from bd217fc to 05b93f0 Compare May 7, 2024 16:26

conorsch had a problem deploying to smoke-test May 7, 2024 16:26 — with GitHub Actions Error

conorsch force-pushed the migration-testing branch from 05b93f0 to 06517b9 Compare May 7, 2024 16:30

conorsch temporarily deployed to smoke-test May 7, 2024 16:30 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from 06517b9 to d78f2a5 Compare May 7, 2024 16:43

conorsch temporarily deployed to smoke-test May 7, 2024 16:43 — with GitHub Actions Inactive

conorsch mentioned this pull request May 7, 2024

storage: possible race in migration logic #4344

Closed

conorsch force-pushed the migration-testing branch from d78f2a5 to 7d5fdcd Compare May 7, 2024 17:24

conorsch temporarily deployed to smoke-test May 7, 2024 17:24 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from 7d5fdcd to a57489b Compare May 7, 2024 18:33

conorsch temporarily deployed to smoke-test May 7, 2024 18:34 — with GitHub Actions Inactive

conorsch mentioned this pull request May 7, 2024

fix(migration): release storage locks #4347

Merged

1 task

conorsch force-pushed the migration-testing branch from a57489b to 942fec9 Compare May 7, 2024 20:04

conorsch temporarily deployed to smoke-test May 7, 2024 20:04 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from 942fec9 to dc0ee03 Compare May 7, 2024 20:58

conorsch temporarily deployed to smoke-test May 7, 2024 20:58 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from dc0ee03 to 5b08436 Compare May 8, 2024 01:13

conorsch had a problem deploying to smoke-test May 8, 2024 01:13 — with GitHub Actions Error

conorsch force-pushed the migration-testing branch from 5b08436 to 7c89436 Compare May 8, 2024 01:22

conorsch temporarily deployed to smoke-test May 8, 2024 01:22 — with GitHub Actions Inactive

cratelyn added this to the Sprint 6 milestone May 8, 2024

cratelyn added C-enhancement Category: an enhancement to the codebase A-CI/CD Relates to continuous integration & deployment of Penumbra A-testing Area: Relates to testing of Penumbra A-upgrades Area: Relates to chain upgrades labels May 8, 2024

conorsch temporarily deployed to smoke-test May 24, 2024 18:56 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from d6f2d06 to ad5476f Compare May 29, 2024 00:18

conorsch temporarily deployed to smoke-test May 29, 2024 00:18 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from ad5476f to 9794e24 Compare May 29, 2024 00:59

conorsch temporarily deployed to smoke-test May 29, 2024 00:59 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from 9794e24 to bd0c544 Compare May 29, 2024 15:52

conorsch temporarily deployed to smoke-test May 29, 2024 15:53 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from bd0c544 to 542158b Compare May 29, 2024 16:37

conorsch temporarily deployed to smoke-test May 29, 2024 16:37 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from 542158b to 44d47b4 Compare May 29, 2024 20:07

conorsch temporarily deployed to smoke-test May 29, 2024 20:07 — with GitHub Actions Inactive

conorsch changed the title ~~feat(tests): automated migration testing~~ tests(pd): automated migration testing May 29, 2024

conorsch requested a review from cratelyn May 29, 2024 20:08

conorsch marked this pull request as ready for review May 29, 2024 20:44

cratelyn approved these changes May 30, 2024

View reviewed changes

conorsch force-pushed the migration-testing branch from 44d47b4 to eb3c0e3 Compare May 30, 2024 18:12

conorsch temporarily deployed to smoke-test May 30, 2024 18:12 — with GitHub Actions Inactive

conorsch force-pushed the migration-testing branch from eb3c0e3 to af07e4f Compare May 30, 2024 19:09

conorsch had a problem deploying to smoke-test May 30, 2024 19:09 — with GitHub Actions Error

remove force flag from migration test

2a6ee60

conorsch temporarily deployed to smoke-test May 30, 2024 19:18 — with GitHub Actions Inactive

conorsch marked this pull request as draft June 3, 2024 15:39

cratelyn modified the milestones: Sprint 7, Sprint 8 Jun 4, 2024

erwanor mentioned this pull request Jun 10, 2024

Enforce string limits for deserialization #4567

Merged

1 task

conorsch closed this Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests(pd): automated migration testing #4337

tests(pd): automated migration testing #4337

conorsch commented May 6, 2024 •

edited

Loading

hdevalence commented May 7, 2024

conorsch commented May 29, 2024

cratelyn left a comment •

edited

Loading

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

cratelyn May 30, 2024

conorsch commented Jun 3, 2024

conorsch commented Jun 28, 2024

	// CLI interfacts for the pcli binaries.
	// CLI interfacts for the pcli binaries.

	#[cfg_attr(not(feature = "migration-test"), ignore)]
	#[cfg(feature = "migration-test")]

tests(pd): automated migration testing #4337

tests(pd): automated migration testing #4337

Conversation

conorsch commented May 6, 2024 • edited Loading

Describe your changes

Issue ticket number and link

Checklist before requesting a review

Review, running locally

hdevalence commented May 7, 2024

conorsch commented May 29, 2024

cratelyn left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conorsch commented Jun 3, 2024

conorsch commented Jun 28, 2024

conorsch commented May 6, 2024 •

edited

Loading

cratelyn left a comment •

edited

Loading