Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-28: Transient parameters #1450

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
216 changes: 216 additions & 0 deletions keps/0028-resettable-parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
---
kep-number: 28
short-desc: A parameter flag to reset parameter values after a plan is executed
title: Resettable parameters
authors:
- @aneumann82
owners:
- @aneumann82
editor:
creation-date: 2020-03-31
last-updated: 2020-03-31
status: provisional
see-also:
replaces:
superseded-by:
---

## Summary

This KEP describes addition of a flag for parameters that allows a parameter to be reset after it was used in a plan.

## Motivation

The new flag allows an operator to define a parameter that is basically one-time-use. Especially for manually triggered
plans this will be an often used case:
- Start a backup with a specific name
- Evict a specific pod from a stateful set
- Start a repair plan for a specific node or datacenter

All these examples have in common that they require input parameters. The parameters are required, and the user should be
forced to set them. If we do not have resettable parameters, it might happen that a parameters are still set from a
previous execution and the user forgets to set it.

When parameters are marked as resettable, they are set to the default value after plan execution, and KUDO can error
out if a required parameter is not set.

### Goals

Make it possible to automatically reset a parameter after a plan is executed.

### Non-Goals

- Set parameters to specific values (except for a default)
- Set parameters at other moments than at the end of a plan

## Proposal 1

Add an additional attribute `resettable` to parameter specifications in `params.yaml`:

```yaml
- name: BACKUP_NAME
description: "The name under which the backup is saved."
resettable: "true"
```

The default value for this flag would be `false`.

If the flag is set to `true`, the parameter will be set to the default value when *any* plan finishes. This change
of parameter value should *not* trigger any plan execution. This is the preferred proposal.

An alternative would be a "string" type parameter that allows a user to set plans after which the parameter
is reset:

```yaml
- name: BACKUP_NAME
description: "The name for backups to create or restore."
resetAfterPlan: [ "backup", "restore" ]
```

This would reset the parameter after the plan `backup` is executed. The downside with this approach is that a parameter
could be set at some point and then be unknowingly used later.

Pros:
- Both variants would be an easy extension for parameters from the definition point of view

Cons:
- The parameters for a specific plan are not separated from "normal" parameters
- It's not easy to determine

## Proposal 2

Add new task type, `SetParameters`:
```yaml
- name: backup-done
kind: SetParameters
spec:
params:
- name: 'RESTORE_FLAG'
value: nil
```
This is a lot more powerful, but also provides a lot more ways to introduce complexity: Parameter values could change
inside a plan execution, what about triggered plans from param changes, etc.

Pros:
- Very powerful
- Would allow easy extension to set parameters to custom values

Cons:
- Very complex
- Parameters could change while the plan is executed
- What happens when a plan is triggered by a changed parameter

## Proposal 3

Specific plan parameters: These parameters would only be valid inside a specific plan and could be defined inside the
operator:

```yaml
plans:
backup:
params:
- name: 'NAME'
description: "The name under which the backup is saved."
strategy: serial
phases:
- name: nodes
strategy: parallel
steps:
- name: node
tasks:
- backup
```

Alternatively it might be possible to define this in the `params.yaml`
```yaml
- name: backup.NAME
plan: backup
description: "The name under which the backup is saved."
```

These parameters would only be valid during the execution of that plan.

To use the parameter, it would have to be prefixed with the plan name in which it is defined:
`kudo update <instance> -p backup.NAME=MyBackup`

```
apiVersion: batch/v1
kind: Job
metadata:
name: {{ $.PlanParams.backup.NAME }}
```

Pros:
- Would work well for manually triggered plans
- The parameter scope would be clearly defined

Cons:
- Could potentially increase the size of the operator.yaml


## Proposal 4

Define a list of parameters to be reset when a plan is finished:

```yaml
plans:
backup:
resetAfterPlan:
- BACKUP_NAME
strategy: serial
phases:
- name: nodes
strategy: parallel
steps:
- name: node
tasks:
- backup
```

Pros:
- Does not add a new flag on parameter definition

Cons:
- It won't be obvious from the parameter list that this is a plan specific parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a Proposal 5:

TL;DR: we use recently introduced Spec.PlanExecution to hold transient parameter values. So a CLI command like:

$ k kudo plan trigger backup --BACKUP_NAME foo

will result in the following PlanExecution:

...
spec:
  planExecution:
    planName: backup
    uid: xxx-yyy-zzzz
    parameters:
        BACKUP_NAME: foo

And the planPlanExecution is already reset after the plan is terminal so not much to do there.

Pros:

  • This design captures closely the mental model of passing parameters with a triggered plan and leaves the parameters where they belong: with the current plan execution

Cons:

  • This works best with manually triggered plans which are, I believe, the 90/01 use case. Mixing transient and normal parameters during a plan execution triggered by a parameter update seem very confusing to me so I would want to avoid this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's a new Proposal, it looks a little bit like an implementation variant of Proposal 3?

I don't see any kind of parameter definition in your proposal, would that be part of it somewhere?

With regards to mixing transient and normal parameters: Have a look at the Restore operator User story. In that case it's not a parameter update, but the installation, but it has a convincing use case for mixing transient and normal parameters.
I'm not sure if we could find a use case for parmeter updates, but I wouldn't rule it out.

In any case, I assume you're talking about updating/setting transient and normal parameters, correct? Because we certainly need to read/use both kinds when rendering specific resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's a new Proposal, it looks a little bit like an implementation variant of Proposal 3?

Sort of. Though where we persist parameters in the Instance is an important enough detail 😉

I don't see any kind of parameter definition in your proposal, would that be part of it somewhere?

There is no need for changing the parameter definitions.

In that case, it's not a parameter update, but the installation, but it has a convincing use case for mixing transient and normal parameters.

Yeah, I saw it. So k kudo install ... -p RESTORE_NAME=foo command would end up with a PlanExecution like:

...
spec:
  planExecution:
    planName: deploy
    parameters:
        RESTORE_NAME: foo

This still fits the model nicely.

In any case, I assume you're talking about updating/setting transient and normal parameters, correct?

Exactly. A simple k kudo update should not mix both IMHO. But even if there is an absolutely compelling story for it, we could still do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this a bit more:

I'm not sure if that's a new Proposal, it looks a little bit like an implementation variant of Proposal 3?

Sort of. Though where we persist parameters in the Instance is an important enough detail 😉

I think storing the parameters in the planExecution makes a lot of sense in all proposals, I've updated the sections accordingly.

I don't see any kind of parameter definition in your proposal, would that be part of it somewhere?

There is no need for changing the parameter definitions.

I'm not sure this is true. If we don't change the param definitions, it would be possible to set a BACKUP_NAME in a kudo update invocation. It would then be stored in the permanent section in the instance. KUDO wouldn't have any way to determine that a parameter is (or should) only be used in kudo plan trigger.

In that case, it's not a parameter update, but the installation, but it has a convincing use case for mixing transient and normal parameters.

Yeah, I saw it. So k kudo install ... -p RESTORE_NAME=foo command would end up with a PlanExecution like:

...
spec:
  planExecution:
    planName: deploy
    parameters:
        RESTORE_NAME: foo

This still fits the model nicely.

It does, KUDO still needs a way to know which parameters end up in spec.planExecution.parameters and which one should be in spec.parameters.

In any case, I assume you're talking about updating/setting transient and normal parameters, correct?

Exactly. A simple k kudo update should not mix both IMHO. But even if there is an absolutely compelling story for it, we could still do it.

I think this use case will be common enough that we should consider it. I think it would be good to be able to distinguish between permanent and transient parameters in the invocation though:

kudo install -p NODE_COUNT=3 -p install.RESTORE_BACKUP_NAME=MyBackup
vs
kudo install -p NODE_COUNT=3 -p RESTORE_NAME=MyBackup

woud make it at least a bit more clear that something is different between the parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, KUDO still needs a way to know which parameters end up in spec.planExecution.parameters and which one should be in spec.parameters.

If a parameter is in spec.planExecution it's transient. If it's in the spec.parameters it's persistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a parameter is in spec.planExecution it's transient. If it's in the spec.parameters it's persistent.

Yes, for the KUDO manager side that's all good and well, but how does the KUDO CLI decide whether to put a parameter into spec.planExecution or spec.parameters?

Copy link
Contributor

@zen-dog zen-dog May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. if you kudo plan trigger -p ... we treat the parameters as transient
  2. kudo update --instance -p ... we treat the parameters as persistent

The rationale behind it being: if you need to update parameters and trigger a plan, you need simply update the parameters - the plan is triggered anyway. Triggering a plan directly (1) doesn't need -p so we can treat the parameters as transient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If you kudo plan update you can't set transient parameters. I agree with that
  2. If you kudo plan trigger you can't change permanent parameters. I kind of agree with that. It might be nice, but it's not really necessary.
  3. For kudo install I think there are reasonable use cases to specify both transient and permanent parameters.

The big issue I have when the operator developer can't specify which parameters are transient and which are permanent is that the user can accidentally use them in the wrong context:

For example:

  • BACKUP_NAME transient parameter
  • NODE_COUNT permanent parameter

If we follow your approach, there is no way to prevent a user from
a) kudo update --instance -p BACKUP_NAME=asdf Now the BACKUP_NAME is stored in spec.parameters where it really doesn't belong
b) kudo plan trigger -p NODE_COUNT=5 The given value would be in spec.parameters with the old value and in spec.planExecution with a transient value. If the triggered plan uses NODE_COUNT it has to decide which value to use. Additionally, the user might expect the NODE_COUNT to be saved.

So, TL;DR: We can't error out when a user uses a parameter in the wrong context and it might lead to wrong assumptions how a parameter is used



### User Stories

- [#1395](https://github.com/kudobuilder/kudo/issues/1395) Resettable parameters

#### A backup plan for the Cassandra Operator

A plan that starts a backup for the whole cluster which is manually triggered

It has a BACKUP_NAME parameter that specifies the name of a backup that is to be created.
This parameter needs to be unique on every execution and should not be reused. If the parameter is not unset after the
backup plan is finished, a user could forget to set it again for the next execution.

- Plan is manually triggered

#### The restore operation in the Cassandra Operator

The restore operation on the Cassandra Operator can be used to create a new cluster from an existing backup.

It uses a RESTORE_FLAG parameter that can be used on installation of a new C* cluster to restore an existing backup.
It sets up an initContainer that downloads the data and prepares the new cluster to use this data.
The initContainer is only used on the very first start of the cluster and should not exist on subsequent restarts of
the nodes, additionally the RESTORE_FLAG is useless/not used after the initial deploy plan is done.

- Plan is `deploy`, used on installation

### Implementation Details/Notes/Constraints

- The parameter reset should happen if a plan reaches a terminal state, either `COMPLETED` or `FATAL_ERROR`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you want it on FATAL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I think FATAL_ERROR is a valid case to reset, as the plan is finished.
If the plan should be restarted, the user would have to set the parameter again. (An possibly fix the problem that led to the FATAL_ERROR)


### Risks and Mitigations



## Implementation History

- 2020-03-31 - Initial draft. (@aneumann)
- 2020-04-03 - Added alternatives and user stories

## Alternatives