Model Evolution #121

naegelejd · 2023-12-15T17:33:13Z

No description provided.

naegelejd · 2023-12-15T18:38:41Z

The current approach in yardl for supporting for model evolution involves:

Comparing the current model to each previous version
Annotating the current model with each detected change
Processing the annotations to emit user warnings/errors
In codegen, inspecting annotations to
1. Serialize previous versions of TypeDefinitions and Protocols
2. Convert between types where necessary

Details on this rough draft PR:

The relevant changes are in:
1. tooling/pkg/dsl/evolution.go
2. tooling/internal/cpp/include/detail/binary/header.h
3. tooling/internal/cpp/include/detail/binary/reader_writer.h
4. tooling/internal/cpp/protocols/protocols.go
5. tooling/internal/cpp/binary/binary.go
There is still much to be done in evolution.go but I have a good handle on that. Examples:
- Consider NOT using Annotations to capture schema changes (it works fine, but it's verbose and error prone)
- Correct conversions for scalar <-> Union type changes
- Handle Union <-> Union type changes (e.g. adding/removing a Type)
- Capturing TypeDefinition changes, e.g. to warn about added/removed non-Optional Record fields
- Detect changes to TypeArguments
- Other TODOs in code
The changes to the included C++ binary headers distinguish between schema_ and previous_schemas_ only to avoid breaking the NDJson and HDF5 code. This would be cleaned up and probably use just a single vector of schemas.
Codegen is not using the version label specified in the package file. Once the schema is known by the Protocol Reader/Writer, it just uses the schema index to determine which serializers to call.
Need to determine the best way for a User to instantiate a Protocol Writer for an older version of a Protocol.
Currently, the User must have instantiated a Protocol Reader r using an older schema, then say MyProtocolWriter w(stream, r.GetSchema())
- We could generate unique constructors for each version, thereby utilizing the version label specified in the package file.
Binary codegen needs a bit more cleanup to remove duplicate code for type conversions. Thoughts on the switch(schema_index_) {...} approach?
The example models and C++ code (within evolution/) are just a starting point for integration tests

This includes compatibility support for C++ Binary protocols codegen only, including reordering Record fields and adding new Optional fields to records.

This helps support many possible schema changes.

But FYI I'm going to revert this to capture ALL errors at the top level

Also aggregate warnings and errors for schema changes

Also added support for changing the inner type of an Optional

Also started adding support for Union<->Union changes

naegelejd self-assigned this Dec 21, 2023

naegelejd added 19 commits December 26, 2023 10:48

Add initial model evolution strict comparison

0b965b4

Refactor evolution to annotate model changes

7e02ba2

This includes compatibility support for C++ Binary protocols codegen only, including reordering Record fields and adding new Optional fields to records.

Add initial infrastructure for type conversions

64207ee

This helps support many possible schema changes.

Change evolution type functions to return errors

0d401dc

But FYI I'm going to revert this to capture ALL errors at the top level

Add type conversions for Protocol Steps

db41115

Also aggregate warnings and errors for schema changes

Remove superseded strict model comparison code

8b8a1ca

Update generated C++ unittest code

e34cc37

Add example model evolution integration tests

1eb7c05

Fixes for CI and correct evolution codegen location

61c7e39

Ignore: Remove carriage returns from serializers.h

e048def

Clean up type conversion calls in binary codegen

76e8c32

Organize these silly annotations

72294f7

Make TypeChange an interface to support conversions

4a326c9

Fix evolution GeneralizedType changes

89e88cb

Also added support for changing the inner type of an Optional

Only compare TypeDefinitions used within a Protocol

b6b91eb

Add version selection for Binary Reader/Writer

3025710

Write versioned serializers in deterministic order

b4e0566

Handle Definition changes more robustly

5c408f1

Also started adding support for Union<->Union changes

Add unit and integration tests for model evolution

9ebb3b3

naegelejd force-pushed the naegelejd/evolution branch from f38e9d2 to 9ebb3b3 Compare January 3, 2024 22:34

naegelejd closed this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Evolution #121

Model Evolution #121

naegelejd commented Dec 15, 2023

naegelejd commented Dec 15, 2023 •

edited

Loading

Model Evolution #121

Model Evolution #121

Conversation

naegelejd commented Dec 15, 2023

naegelejd commented Dec 15, 2023 • edited Loading

naegelejd commented Dec 15, 2023 •

edited

Loading