Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a variety of key definitions surrounding stability #142

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jack-berg
Copy link
Member

Related to #2.

Resolves #89.

Took inspiration from prior art in the opentelemetry-proto stability definition, but adapted for a variety of differences in the the domains of OTLP vs. declarative configuration, protobuf vs. JSON schema.

@jack-berg jack-berg requested a review from a team as a code owner November 26, 2024 22:59

All schema changes are considered through the lens of maintaining these objectives.

### Guarantees and allowed changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section is written as "absolute" terms.

In reality, every statement here can be invalid, when interpreted in a different, "relative" context.

By absolute/relative, I mean this:

The format is not just a contract between two (2) parties.

The format is a contract between three (3) parties, which are:

  • [SCHEMA] maintainers of the opentelemetry-configuration repository, when defining a yaml schema
  • [SIG] maintainers of a SIG, when parsing configuration from a yaml schema
  • [USER] maintainers of a given user configuration files, when writing a yaml schema configuration

I think it will be beneficial to details separately:

  • the contract and expectations between [SCHEMA] and [SIG]
  • the contract and expectations between [SCHEMA] and [USER]
  • the contract and expectations between [USER] and [SIG]

A claim as "Type property names will not change" is not absolute, some type names will change.

For a given schema version number, the type property name will not change even on minor file_format changes, sure, so this is true in the context of [USER] - [SIG], within some constraints on the file_format value.

This does not forbid name changes when evolving the schema definitions:

  • new schema can rename properties, affecting the [SCHEMA] - [SIG] part when parsing, for example on a major version change in file_format
  • new schema can rename properties, affecting the [SCHEMA] - [USER] part when writing a yaml config, also on major version change in file_format: users changing the file_format tag in their file are expected to adjust to the new format spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format is a contract between three (3) parties, which are:

I'm not sure I completely understand what you mean by this. I think part of what you're saying is reflected in the objectives section, which explains that stability guarantees impact both a user writing YAML config (i.e. [SCHEMA] and [USER] in your terms), and implementations interpretting this config and publishing in-memory representations of the data model (i.e. [SCHEMA] and [SIG] in your terms).

A claim as "Type property names will not change" is not absolute, some type names will change.

What do you mean by this? What I mean to claim here is that a given property name (let's say the .tracer_provider property for example) which is part of a stable type (i.e. no */experimental suffix) will not change in minor or path versions.

new schema can rename properties, affecting the [SCHEMA] - [SIG] part when parsing, for example on a major version change in file_format

Yes, all changes are allowed in major version updates. Do we need to explicitly state this? I'd like to follow the lead of opentelemetry-proto's stability definitions and try to keep language as terse as is required by the language.

Do you think that it would help to explain that these stability requirements apply to minor version updates, and explain that major version updates can include carte blanche breaking changes.

Copy link
Member

@marcalff marcalff Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I see is the following:

A major update in the yaml format will have a massive impact:

  • all SDK implementations will have to be updated with the new schema
  • all the SIG will have to make a new release, including the SIG that are not concerned with the change in the first place (for example, because they do not implement all features, and the change that caused the major update is located in a part of the yaml schema which is not supported)
  • all the user applications will need to deploy the new SDK, for all languages
  • all user configurations will need to be updated with the new yaml version

When the root cause of the change is to say that:

exporter:
  jaeger:

is no longer valid, this has huge consequences for everybody, including SDK that did not supported jaeger in the first place, just because MAJOR changed from 1 to 2.

The text as written does a very good job of describing what a stable type is, it really does.

The problem is that when something affects the stability of a type, there should be a way to mitigate changes and provide a transition path to change the schema in a MINOR version instead of forcing a major version change.

Another example:

something:
  # constraints in the schema enforce a valid range of [0, 100]
  foo_value: 50

The day the value ranges needs to change to [10, 90] instead, this makes it a breaking change, forcing a major version update to the schema.

Instead, there should be a way to provide a reasonable migration path, like:

  • (a) schema 1.0.0 allows range [0, 100]
  • (b) schema 1.1.0 allows range [0, 100] in yaml, and document that the SDK implementation must:
    • increase values below 10 to 10,
    • decrease values above 90 to 90,
    • raise a warning when this happens
  • (c) schema 1.2.0 allows range [10, 90]

Item (b) is about stability garantees between [SCHEMA] and [SIG] as I called it, where the SIG implementation is to expect extra processing beyond yaml parsing, mandated by the schema.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A major update in the yaml format will have a massive impact:

I agree the impact is huge.

The impact is so huge that it creates a strong incentive to avoid a major version update. And if the rules are fairly (and necessarily) restrictive in terms of what is allowed in a minor version update, then it creates a strong incentive to model types correctly the first time.

Consider other areas where similar strict restrictions exist like opentelemetry-proto, or the specification, or language implementations like opentelemetry-java.

We have definitions of what is allowable in minor versions to protect users, and to promote careful / thoughtful design. In all these cases, a major version bump is something to strongly avoid. We write down when a major version is required with the hope that we don't have to do it.

For your examples:

  1. The first example is the removal of a known property / type, the jaeger exporter. This is considered a breaking change under the rules laid out in this PR. Luckily for us, jaeger is already gone, but the same could happen to other exporters, processors, samplers, etc. The fact that removing a type is a breaking change is an incentive to keep it some form (potentially deprecated) until the next major version, which could be indefinitely if there is never a strong enough reason. Its not an incentive to make invasive changes and bump the major version.
  2. The second example is the adjustment of constraints from allowing [0, 100] to [10, 90]. This is considered a breaking change under the rules laid out in this PR. I think this is correct. In the migration path you lay out, the change in behavior is broken into multiple minor versions, but this doesn't change the fact that a value of 5 was valid in 1.0.0, and became invalid in 1.2.0. No major version change occurred yet the user's input became invalid. I think this is wrong, regardless of whether or not it is broken into steps. For authors of the schema, the fact that making validation more strict requires a major version bump is an incentive to get the validation criteria correct up front. The PR lists all the different types of validation conditions present in JSON schema for completeness, but I really don't see us using those in most cases. minLength, maxLength, format, multipleOf and others seem like really niche keywords we really ought to avoid. I do anticipate (from experience so far) we'll use keywords like pattern, propertyNames, minProperties, maxProperties, required, and enum frequently. It seems reasonable to model these correctly up front. And if we're not sure, we can start off stricter and loosen the criteria as needed, since reducing strictness is allowed.

Item (b) is about stability garantees between [SCHEMA] and [SIG] as I called it, where the SIG implementation is to expect extra processing beyond yaml parsing, mandated by the schema.

On a somewhat unrelated note: Right now the extra processing responsibilities of the SIG implementation are sometimes implied and other times made explicit in comments. This seems flimsy, and error prone as SIG maintainers may miss some subtle change in the expected semantics. To me, this implies we should minimize these extra responsibilities.

README.md Outdated Show resolved Hide resolved
* [uniqueItems](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems): will not go from `true` to `false`.
* [enum](https://json-schema.org/understanding-json-schema/reference/enum): will not remove entries.
* [const](https://json-schema.org/understanding-json-schema/reference/const): will not change.
* No existing type will be deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume a format 1.0.0 (stable) that contains the Jaeger exporter, which we all know has been deprecated and removed from the spec.

What this says is that the jeager exporter will never be removed from the yaml format in version 1.x.y, as this requires a major format change.

There must be a reasonable way to deprecate and remove features, and I don't think the following is shocking:

  • format 1.0.0 does support a jeager exporter
  • format 1.1.0 removes support for jaeger, and documents this as an incompatible breaking change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a reasonable way to deprecate and remove features, and I don't think the following is shocking:

I think there must be a way to deprecate features, but not removal. Deprecated features must linger on (potentially indefinitely) until a major version bump. This shifts the burden of understanding / dealing with schema evolution from users to maintainers, which is where it belongs.

Semantic conventions reached this same conclusion: https://opentelemetry.io/docs/specs/otel/versioning-and-stability/#deprecated

I believe (and hope) language SIGs reach this same conclusion. In java, we maintain code which we marked as stable but have deprecated with the same gaurantees as other stable code - no removal until next major version, which we have no plans of. Here's an example. We accidentally exposed a protected constructor, which is not aligned with our code style in the repository. Even though its unlikely anyone was / is using this, we've maintained it for 3 years, and will continue maintaining it.

* [required](https://json-schema.org/understanding-json-schema/reference/object#required): will not add additional entries.
* [contains, minContains, maxContains](https://json-schema.org/understanding-json-schema/reference/array#contains): will not become stricter.
* [minItems, maxItems](https://json-schema.org/understanding-json-schema/reference/array#length): `minItems` will not increase, `maxItems` will not decrease.
* [uniqueItems](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems): will not go from `true` to `false`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false to true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch


* The `file_format` major version aligns with the implementation major version, AND:
* The `file_format` minor version is less than or equal to the implementation minor version: This is ideal, with versions maximally aligned. Despite this, an implementation might not support every property and type of its target version.
* The `file_format` minor version is greater than the implementation minor version: The implementation should detect and emit a warning since there may be configuration features the user specifies which the implementation does not understand. However, this is acceptable in many cases, and not terribly different from the ideal path where an implementation also might not support every configuration feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as an example: minLength is 10 in 1.1.0 and minLength is 5 for 1.2.0 (hence relaxing the value). User of 1.2.0 expects 7 to be a valid value, but for 1.1.0 it is not. What do you do here? Generate a warning and accept the value? Or ignore it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User of 1.2.0 expects 7 to be a valid value, but for 1.1.0 it is not.

So the user has specified that they are using 1.2.0 by setting file_format: 1.2. The implementation is conforming to 1.1.0. The implementation detects that the user's version is great than the implementation and emits a warning, alerting the user to the fact that a 1.1.0 implementation won't be able to understand some of the changes from a 1.2.0 file format.

In this case, the implementation rejects the value as its outside of the acceptable range according to the schema its aware of. If implementations stay up to date with opentelemetry-configuration releases, it will be much more common for the implementation's version to be greater than the users (which is ideal) than for the user's version to be greater than the implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define stability guarantees for stable releases
3 participants