[DNM] (review-only) WIP: OLM v1 API review #2067

everettraven · 2024-10-17T14:06:51Z

The Operator Framework group has been working on a re-vamp of the Operator Lifecycle Manager project that we have dubbed "OLM v1". Our upstream project has adopted the OpenShift API conventions and since we intend for it to go GA in OpenShift 4.18 we wanted to get the APIs reviewed before GA.

This PR adds a new olm directory containing the Go types for the two APIs that would be introduced:

ClusterExtension
ClusterCatalog

Each API is under a separate sub-directory in this PR because while they are fairly coupled, the APIs live in separate upstream projects:

The ClusterExtension API lives in the upstream project https://github.com/operator-framework/operator-controller (which has a downstream equivalent at https://github.com/openshift/operator-framework-operator-controller) and as such I named the subdirectory operator-controller
The ClusterCatalog API lives in the upstream project https://github.com/operator-framework/catalogd (which has a downstream equivalent at https://github.com/openshift/operator-framework-catalogd) and as such I named the subdirectory catalogd

Also of note for reviewers, there is a portion of the ClusterCatalog API that is still a work-in-progress, but was included in this PR with what we anticipate it to look like. I will leave PR comments on the exact locations with references to the design document we have in place for the API change and functionality intended.

Signed-off-by: everettraven <[email protected]>

openshift-ci · 2024-10-17T14:07:00Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2024-10-17T14:07:03Z

Hello @everettraven! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci · 2024-10-17T14:08:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: everettraven
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

olm/catalogd/v1alpha1/clustercatalog_types.go

olm/operator-controller/v1alpha1/clusterextension_types.go

Signed-off-by: everettraven <[email protected]>

deads2k · 2024-10-22T13:44:24Z

olm/operator-controller/v1alpha1/clusterextension_types.go

+	// When set to "Enabled", the CRD Upgrade Safety pre-flight check will be run when
+	// performing an upgrade operation.
+	//
+	//+kubebuilder:validation:Enum:="Enabled";"Disabled"


Enabled/Disabled should be replaced with descriptive levels like

InformationalOnly

Strict

Could later add: CriticalOnly.

We talked about this for awhile yesterday and got hung up on InformationalOnly because we didn't think a user would actually be able see the information on the API.

We ended up settling on: CRDUpgradeSafetyEnforcement with None and Strict as options, and we could later add Permissive, CriticalOnly, etc. WDYT?

For InformationalOnly is there recommended way to provide that information to a user? Since this is an edge-based check, we couldn't immediately think of a way to put this in the API in a way that it would "stick". We mentioned recording an Event, but those are ephemeral.

For now, we landed on "None", which means "don't even run the checks", and in our controller we will log what the enforcement enum was set to when we reconcile so that we have more information to help reconstruct why a cluster might be borked when a customer reports an issue with a ClusterExtension.

olm/catalogd/v1alpha1/clustercatalog_types.go

deads2k · 2024-10-22T14:04:33Z

olm/catalogd/v1alpha1/clustercatalog_types.go

+	// When omitted, the image will not be polled for new content.
+	// +kubebuilder:validation:Format:=duration
+	// +optional
+	PollInterval *metav1.Duration `json:"pollInterval,omitempty"`


lower bound on the poll interval

1 minute as the lower bound sounds good as discussed.

1 minute means we have an e2e that will take ~1 minute to run, instead of the current configuration, which sets a poll interval as 1 second, and the test finishes in a couple of seconds.

That's not a good motivation for the design of an API, but just something for us to consider on the e2e side of things. Maybe we could run this test in parallel with others?

Testing at a different poll interval than you'd expect real life users to leverage is likely to either expose bugs or mask them, so I'd encourage testing at a real-ish poll interval where possible

Out of interest, what do you expect the "default" or average to be? Every hour?

The "default" that we set on the ClusterCatalog resources that ship as part of the OCP payload will be 10 minutes, which is the same as what the existing OLM v0 CatalogSource resources that ship as part of the OCP payload.

olm/catalogd/v1alpha1/clustercatalog_types.go

JoelSpeed

There are a lot of places where my comments would/are duplicated, I didn't capture them all as I didn't want to constantly repeat myself.

For the most part I think the structure looks good, though I've left some comments where we might need more thought about structure.

The majority of what I've left is suggestions on the godocs and what I would consider missing validations.

olm/catalogd/v1alpha1/clustercatalog_crd.yaml

olm/catalogd/v1alpha1/clustercatalog_types.go

olm/operator-controller/v1alpha1/clusterextension_types.go

olm/catalogd/v1alpha1/clustercatalog_types.go

JoelSpeed · 2024-10-23T10:57:11Z

olm/operator-controller/v1alpha1/clusterextension_types.go

+)
+
+// ClusterExtensionSpec defines the desired state of ClusterExtension
+type ClusterExtensionSpec struct {


Source and Install could well be interpreted to be quite similar, eg, why is the source not part of the installation configuration, I wonder if changing the names might make this more obvious, or perhaps, source should be within install? Or the fields from install should just be promoted to the spec?

The loose idea is that there are stages:

source the bundle from $somewhere (for now, from a catalog. in the future, perhaps from a direct reference or from a helm repo)

(not implemented yet) template/render the bundle using cluster-admin defined configuration

install it (i.e. change the cluster to reflect the result of (2))

So the idea was to have logical groupings of fields in the API so that fields from different concepts would not be interspersed.

Just a note that a couple of the install fields have been promoted to the spec.

Right, I wonder if the details of that flow can somehow be represented in the API field documentation so that it's easier to understand, but agreed, the logical grouping makes sense

JoelSpeed · 2024-10-23T11:17:20Z

olm/operator-controller/v1alpha1/clusterextension_types.go

+	//+kubebuilder:validation:MaxLength:=64
+	//+kubebuilder:validation:Pattern=`^(\s*(=||!=|>|<|>=|=>|<=|=<|~|~>|\^)\s*(v?(0|[1-9]\d*|[x|X|\*])(\.(0|[1-9]\d*|x|X|\*]))?(\.(0|[1-9]\d*|x|X|\*))?(-([0-9A-Za-z\-]+(\.[0-9A-Za-z\-]+)*))?(\+([0-9A-Za-z\-]+(\.[0-9A-Za-z\-]+)*))?)\s*)((?:\s+|,\s*|\s*\|\|\s*)(=||!=|>|<|>=|=>|<=|=<|~|~>|\^)\s*(v?(0|[1-9]\d*|x|X|\*])(\.(0|[1-9]\d*|x|X|\*))?(\.(0|[1-9]\d*|x|X|\*]))?(-([0-9A-Za-z\-]+(\.[0-9A-Za-z\-]+)*))?(\+([0-9A-Za-z\-]+(\.[0-9A-Za-z\-]+)*))?)\s*)*$`
+	//+optional
+	Version string `json:"version,omitempty"`


Would this be easier to reason about, if you created a structure to capture the version constraints, rather than trying to regex match and explain this massive validation?

For example, just splitting this into a list of strings, and saying that a match of any item in the list is considered a match, would remove the need to account for the OR in this.

Thinking about the ~ and ^ usage, is that not effectively a form of pinning to either the major, minor or patch versions of the version?

Could a field that was an enum represent that choice? upgradePinning: Major | Minor represent something equivalent? (I don't think you need patch)

One of my concerns about the validation here at the moment, is that there are many ways to express the same thing

What if this was something like

version: - base: 1.2.0 # This would be ">=1.2.0, < 1.3.0" or "1.2.x" or "~1.2.0" in the current form operator: GreaterOrEqual pinnedAt: Minor

Jordan, Lala, Bryce, and I chatted about this for a while yesterday. Our conclusion is that we believe masterminds implements a standard version range syntax used in other well-known parts of the Kubernetes ecosystem (e.g. helm install --version) and handles edge cases well (e.g. pre-release versions are only included in the range if the range itself specifies versions with pre-release values), so we were all agreed that we desired for the implementation to remain.

With that settled, the next questions we discussed were:

Can we confidently translate from a syntax like the one you are proposing back into masterminds syntax?

Are we sure that our pre-translation API would be expressive enough to cover all of the possible masterminds constraint strings? If it was, would that API still be more intuitive than a version range string?

For (1), we thought that we probably could, but it would require an extensive test suite that we would need to maintain. For (2), we were not confident.

Lastly, we talked about whether we thought there was a significant enough UX gain by going with a structured syntax, and we didn't think there was. We believe that the GoDoc on the field, the alignment with a well-known library, and the simplicity of a single string might actually be overall more intuitive than a verbose structure would be.

Ok, that's fair, as this is already relatively well used, I can see the argument for keeping it.

I would encourage though, to try breaking down the pattern validation into many CEL rules that not only would make it easier to maintain (eg you can split on comma/|| and the process the constraints for the strings within) but would also allow more specific error messages potentially

@grokspawn Had mentioned running into some issues with breaking this down into individual CEL rules. I don't recall the exact issues off the top of my head.

JoelSpeed · 2024-10-23T11:19:37Z

olm/operator-controller/v1alpha1/clusterextension_types.go

+	// # OR Comparisons
+	// You can use the "||" character to represent an OR operation in the version
+	// range. Some examples:
+	//   - ">=1.2.3, <2.0.0 || >3.0.0"


If I wanted to exclude a specific version from being valid, do I then just have to create ORs on a range around the version I want to exclude?

You could do something like this:

version: "2.x, !=2.1.2"

Signed-off-by: everettraven <[email protected]>

grokspawn · 2024-11-05T19:06:36Z

here is example output for describing the ClusterExtension CRD

Signed-off-by: everettraven <[email protected]>

grokspawn · 2024-11-11T21:24:33Z

Hey @JoelSpeed , @deads2k just pinging you here to let you know that @everettraven updated this PR with our latest set of changes, and to reflect our understanding vis-a-vis resolved comments.

Please take another look when you have a chance.

I'll ping elsewhere as well, but covering all my bases.

Thanks!

wip: olmv1 API review

19bf695

Signed-off-by: everettraven <[email protected]>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2024

openshift-ci bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 17, 2024

everettraven commented Oct 17, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

everettraven commented Oct 17, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

JoelSpeed reviewed Oct 18, 2024

View reviewed changes

everettraven changed the title ~~WIP: OLM v1 API review~~ [DNM] (review-only) WIP: OLM v1 API review Oct 18, 2024

wip: add generated CRDs

2d9e75f

Signed-off-by: everettraven <[email protected]>

openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 21, 2024

wip: update clustercatalog types to match recent upstream changes

09345cc

Signed-off-by: everettraven <[email protected]>

deads2k reviewed Oct 22, 2024

View reviewed changes

grokspawn reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

grokspawn reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

deads2k reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

deads2k reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

deads2k reviewed Oct 22, 2024

View reviewed changes

grokspawn reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

JoelSpeed reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

deads2k reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

grokspawn reviewed Oct 22, 2024

View reviewed changes

olm/catalogd/v1alpha1/clustercatalog_types.go Outdated Show resolved Hide resolved

LalatenduMohanty mentioned this pull request Oct 22, 2024

External API review submission operator-framework/operator-controller#1358

Closed

JoelSpeed reviewed Oct 23, 2024

View reviewed changes

updates to ClusterCatalog API based on review feedback

a5222d3

Signed-off-by: everettraven <[email protected]>

everettraven mentioned this pull request Nov 6, 2024

⚠️ API updates based on external review operator-framework/catalogd#443

Merged

grokspawn mentioned this pull request Nov 6, 2024

⚠️ updates from api audit operator-framework/operator-controller#1404

Merged

4 tasks

everettraven added 3 commits November 7, 2024 09:32

more clustercatalog updates

474f805

Signed-off-by: everettraven <[email protected]>

clusterextension updates

a068e07

Signed-off-by: everettraven <[email protected]>

more clusterextension updates

ff93f13

Signed-off-by: everettraven <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM] (review-only) WIP: OLM v1 API review #2067

[DNM] (review-only) WIP: OLM v1 API review #2067

everettraven commented Oct 17, 2024 •

edited

Loading

openshift-ci bot commented Oct 17, 2024

openshift-ci bot commented Oct 17, 2024

openshift-ci bot commented Oct 17, 2024

deads2k Oct 22, 2024

joelanford Nov 5, 2024

deads2k Oct 22, 2024

LalatenduMohanty Oct 22, 2024

joelanford Oct 28, 2024

JoelSpeed Oct 28, 2024

everettraven Nov 4, 2024 •

edited

Loading

JoelSpeed left a comment

JoelSpeed Oct 23, 2024

joelanford Nov 4, 2024

everettraven Nov 8, 2024

JoelSpeed Nov 27, 2024

JoelSpeed Oct 23, 2024

joelanford Nov 5, 2024

JoelSpeed Nov 27, 2024

everettraven Nov 27, 2024

JoelSpeed Oct 23, 2024

joelanford Nov 5, 2024 •

edited

Loading

grokspawn commented Nov 5, 2024

grokspawn commented Nov 11, 2024

[DNM] (review-only) WIP: OLM v1 API review #2067

Are you sure you want to change the base?

[DNM] (review-only) WIP: OLM v1 API review #2067

Conversation

everettraven commented Oct 17, 2024 • edited Loading

openshift-ci bot commented Oct 17, 2024

openshift-ci bot commented Oct 17, 2024

openshift-ci bot commented Oct 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

everettraven Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

JoelSpeed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelanford Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

grokspawn commented Nov 5, 2024

grokspawn commented Nov 11, 2024

everettraven commented Oct 17, 2024 •

edited

Loading

everettraven Nov 4, 2024 •

edited

Loading

joelanford Nov 5, 2024 •

edited

Loading