Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Evolution #121

Closed
wants to merge 19 commits into from
Closed

Model Evolution #121

wants to merge 19 commits into from

Conversation

naegelejd
Copy link
Contributor

No description provided.

@naegelejd
Copy link
Contributor Author

naegelejd commented Dec 15, 2023

The current approach in yardl for supporting for model evolution involves:

  1. Comparing the current model to each previous version
  2. Annotating the current model with each detected change
  3. Processing the annotations to emit user warnings/errors
  4. In codegen, inspecting annotations to
    1. Serialize previous versions of TypeDefinitions and Protocols
    2. Convert between types where necessary

Details on this rough draft PR:

  1. The relevant changes are in:

    1. tooling/pkg/dsl/evolution.go
    2. tooling/internal/cpp/include/detail/binary/header.h
    3. tooling/internal/cpp/include/detail/binary/reader_writer.h
    4. tooling/internal/cpp/protocols/protocols.go
    5. tooling/internal/cpp/binary/binary.go
  2. There is still much to be done in evolution.go but I have a good handle on that. Examples:

    • Consider NOT using Annotations to capture schema changes (it works fine, but it's verbose and error prone)
    • Correct conversions for scalar <-> Union type changes
    • Handle Union <-> Union type changes (e.g. adding/removing a Type)
    • Capturing TypeDefinition changes, e.g. to warn about added/removed non-Optional Record fields
    • Detect changes to TypeArguments
    • Other TODOs in code
  3. The changes to the included C++ binary headers distinguish between schema_ and previous_schemas_ only to avoid breaking the NDJson and HDF5 code. This would be cleaned up and probably use just a single vector of schemas.

  4. Codegen is not using the version label specified in the package file. Once the schema is known by the Protocol Reader/Writer, it just uses the schema index to determine which serializers to call.

  5. Need to determine the best way for a User to instantiate a Protocol Writer for an older version of a Protocol.
    Currently, the User must have instantiated a Protocol Reader r using an older schema, then say MyProtocolWriter w(stream, r.GetSchema())

    • We could generate unique constructors for each version, thereby utilizing the version label specified in the package file.
  6. Binary codegen needs a bit more cleanup to remove duplicate code for type conversions. Thoughts on the switch(schema_index_) {...} approach?

  7. The example models and C++ code (within evolution/) are just a starting point for integration tests

@naegelejd naegelejd self-assigned this Dec 21, 2023
This includes compatibility support for C++ Binary protocols codegen only,
including reordering Record fields and adding new Optional fields to records.
This helps support many possible schema changes.
But FYI I'm going to revert this to capture ALL errors at the top level
Also aggregate warnings and errors for schema changes
Also added support for changing the inner type of an Optional
Also started adding support for Union<->Union changes
@naegelejd naegelejd force-pushed the naegelejd/evolution branch from f38e9d2 to 9ebb3b3 Compare January 3, 2024 22:34
@naegelejd naegelejd closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant