Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(run): Add the --dry option to the run command #550

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

paul-rodriguez
Copy link
Contributor

@paul-rodriguez paul-rodriguez commented Jan 8, 2025

The --dry option changes the behavior of the run command to only execute
the setting-up of the pebble daemon without actually starting it.
Side-effects of initialization on the state file are also disabled.

This allows checking the validity of a plan without launching pebble
proper.
For example, if you have a plan layer at /tmp/layers/123-layer.yaml, you
can check it using PEBBLE=/tmp go run ./cmd/pebble run --dry provided
this command is invoked from the root of this repository.

The --dry option changes the behavior of the run command to only execute
the setting-up of the pebble daemon without actually starting it.
Side-effects of initialization on the state file are also disabled.

This allows checking the validity of a plan without launching pebble
proper.
For example, if you have a plan layer at /tmp/layers/123-layer.yaml, you
can check it using `PEBBLE=/tmp go run ./cmd/pebble run --dry` provided
this command is invoked from the root of this repository.
@flotter
Copy link
Contributor

flotter commented Jan 8, 2025

Just a little bit of background - @paul-rodriguez work is based on a CTO approved spec. The reason I mention this is because the approved design would allow for a dry run to make a DryStart call to each overlord manager. Right now we do not have a need as the primary validation we need as part of dry run is Plan Validation, which happens at the end of overlord init. However, we would easily be able to increase the reach of this later by adding the stateengine optional DryStart call (similar to StartUp, but only containing non-state changing validation code for interested managers).

@benhoyt
Copy link
Contributor

benhoyt commented Jan 8, 2025

Moving Mattermost conversation here for reference:


@paul-rodriguez: As you will see, the code is quite simple. So my instinct is to suspect that there must be something I'm missing. All I'm really doing to achieve a dry-run of pebble is disabling writes to the state backend file and interrupting the starting process immediately after the plan has been loaded (and therefore validated). What's your opinion about this? Anything missing? Does this actually correspond to your idea of what dry-run should be in the first place?

@cjdcordeiro: This is great timing 😅 I just told my team about this PR. Somone, probably @rebornplusplus, will be reviewing it (tomorrow or Friday most likely). Conceptually, yes it makes sense to me. A few questions I may have are:

  1. how does dry-run combine with other run options? I guess it could also work with options like --create-dirs and --hold in order to assert that such options can be applied. I do have mixed feelings about this though, cause while it sounds useful, it also feels like it goes beyond the scope of --dry-run, cause for options like --hold, you'd need to scan ports, etc. So maybe it's not something to consider for now
  2. since run creates and accesses files, should the --dry-run be verbose about the paths it is about to touch? E.g. a --dry-run might succeed on a well-written plan, but then run fails if the $PEBBLE or $PEBBLE_SOCKET are not accessible

@flotter: It may be difficult to achieve 2 I think, but with PEBBLE_DEBUG=1 at least we could instrument some path with logger.Debug and it would then show on CLI. At least you will see the error that caused the dry-run to fail.

@paul-rodriguez: I think it would make sense for run --dry to say if it's going to write anything, but ideally that should just never happen, otherwise it's not really a dry run. Do note that right now if you do run --dry and the PEBBLE directory isn't readable for some reason, it will error out.

Right now the answer to your first question is that the other options behave exactly in the same way, but of course they might therefore not have any effect because they influence something that never happens in the --dry case. --hold is such an option. In a sense, --dry implies --hold.

I think there's a legitimate case for --create-dirs being an option that the user may want to use or not in combination with --dry though to be honest, it strikes me as odd. If you use --create-dirs it implies your layers directory is empty, so there's no need for validation.

@benhoyt
Copy link
Contributor

benhoyt commented Jan 8, 2025

I think this is a good starting point for discussion, however, I think pebble run --dry either needs to be a lot fuller, or we need to change the approach to pebble validate and just validate the plan (I'd prefer that approach as it's very clear what it does and does not do, though I realise that re-opens the spec discussion). However, assuming we stick with pebble run --dry, the full text of that part of the spec (KO026) says:

A new command-line option will be added to the run command (--dry) that will prevent the daemon from being started. The validation logic should be performed in the daemon and overlord constructors.

Which admittedly is pretty sparse on details. But I believe a dry-run option 1) definitely shouldn't perform changes/writes, and 2) should try to dry-run as much as it can. If I use the AWS CLI, I expect to be able to tack on --dry-run and it'll do all the things and handle all the other command line arguments passed, but "dry" (that is, no changes, just print out what it would do).

As it stands, this PR is not fulfilling #1, because if --create-dirs is specified and the dir doesn't exist, it'll create it (I realise you normally wouldn't specify --create-dirs if you just want validation, but a dry run shouldn't mutate things). Similar for PEBBLE_COPY_ONCE -- that's a mutation.

It's also not fulfilling #2. I don't think it should necessarily check ports for --http as that's a dynamic operation, but it should at least check the other command line arguments and whatnot. Currently it's stopping before checking --args (convertArgs) and also before it validates the --identities file.

Some of these will be tricky to get right (or even specify), hence my preference for pebble validate or a command that just explicitly validates the plan. However, if we do proceed along pebble run --dry lines, I think we need to do a fuller implementation.

@@ -62,22 +63,29 @@ var sharedRunEnterArgsHelp = map[string]string{
"--args": "Provide additional arguments to a service",
"--identities": "Seed identities from file (like update-identities --replace)",
}
var runArgsHelp = map[string]string{
"--dry": "Start {{.DisplayName}} without side-effects",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't "start" Pebble though, right? I think we need a more accurate description here.

}

func (osb *overlordStateBackend) Checkpoint(data []byte) error {
if osb.DryRun {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking a flag in Checkpoint, could we provide a different type for the state backend, a do-nothing implementation?

@flotter
Copy link
Contributor

flotter commented Jan 9, 2025

I think conceptionally the goal here is to have a way to "test-run" pebble and derivative projects as part of craft artefact creation (i.e. Rockcraft and friends) to detect incompatibilities between generated metadata, the actual binary version and the expected run-time requirements.

Already today, even if we only consider the Plan, derivative Pebble projects would be required to consult additional metadata files to validate the plan. For example, in the craft artefact, is user, user-id, group and group-id in services valid? Also, with the addition of workgroups, the supported workgroups currently will come from a different source, not the plan itself.

So I think we need to solve this as part of this exercise - where do we place additional system validation ? I think the idea of the spec is to say that beyond Plan schema validation, each manager (i.e. servstate) could perform early validation.

The ideal way in my head would be to run the pebble binary in a way that is as close as possible to the real startup, because otherwise we would be maintaining two complete separate entry points.

Also, as per the examples above, "validation" may be manager specific, which could be difficult to pull out outside the daemon/overlord framework, so it may have to live inside the daemon during the normal startup flow. Pebble itself is the simplest case, but at least one of the derivative projects, there are a lot of pre-daemon startup code that will also form part of the "test-run", expecting various additional files to exist.

@paul-rodriguez
Copy link
Contributor Author

I also agree that pebble validate (or validate-plan) is a better name for what the implementation is doing here. My preference would be to stop there as this is what's on our roadmap (not dry-run), and I still feel that dry-run in the case of pebble run is not well-defined enough. That's because pebble run starts a long running process, it's not really a "go from state A to state B" kind of operation, so it's going to be tricky to produce an output such as "if I had run pebble run then X; Y; Z would have happened" and in any case, well outside the scope of what I need for my own purposes.

That said, assuming the goal is to get a meaningful dry-run beyond checking the plan, I'll push a few fixes for 1) and 2) in Ben's message above.

@paul-rodriguez
Copy link
Contributor Author

I have pushed a couple of fixes that should address the issues raised.

Note that I chose to skip adding an explicit message saying "pebble would use identities X and Y if it ran" in the case where --dry is used with --identities (and there's no check failure). I had this in an earlier version, but it felt too cumbersome.

I think all these switches make the runDaemon function hard to parse. The "real" and "dry" implementations are sufficiently different that personally I think there's more value in separating them into two comparatively simpler functions rather than keeping a single one. It's not obvious to me that a single function would be lighter to maintain either.

@paul-rodriguez paul-rodriguez requested a review from benhoyt January 9, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants