-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework models #395
Rework models #395
Conversation
8952151
to
5e68739
Compare
@gtucker despite the pending tasks before getting this ready for merging, I think it's worth it to have a look at it and discuss the possible drawbacks. I think this is a good solution and fixes the needs that we had regarding model types, validation and endpoint ergonomics. To do:
|
@nuclearcat sure, we can do that. Ultimately, it's a matter of finding the right balance between query performance (embedding node data into other nodes) and DB size (saving node pointers instead). Also, in cases where the pointed data is prone to change I don't think we should embed it into chained nodes. But in this case in particular the data should be stable, so it'll work. I'll make the changes. Thanks for reviewing! |
72c004d
to
8781bb1
Compare
|
97cb0df
to
2ff0faf
Compare
Node objects no longer contain a 'revision' field. Signed-off-by: Ricardo Cañuelo <[email protected]>
Implement a mechanism for dynamic polymorphism on Node objects by using explicit pydantic object validation depending on the node kind and storing all Node objects as plain nodes in the same DB collection regardless of their type. Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Perform node validation for the /nodes PUT endpoint. Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_create_node_endpoint to the latest model changes. Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_get_nodes_by_attributes_endpoint to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_get_node_by_id_endpoint to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_get_all_nodes to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync regression e2e tests to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
fc49c30
to
449535f
Compare
Sync DB to model changes after these commits: api.models: basic definitions of Node submodels api.main: use node endpoints for all type of Node subtypes api.db: remove regression collection Signed-off-by: Ricardo Cañuelo <[email protected]>
1b08404
to
0bf1a2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested manually on staging and followed all logs to make sure all works
NOTE: These changes will break an API instance that is connected to a DB with existing nodes, since this changes the node models. The db migration script included in the changes handles the data migration cleanly (hopefully).
Related PRs:
kernelci/kernelci-core#2224 and this PR are interdependent, so they should be merged together. I suggest kernelci/kernelci-core#2224 is merged first, then this one and then kernelci/kernelci-pipeline#365. Remember to apply the migration scripts.
This PR has already been tested with PR applied, with all tests passing.
Definition of Node and Node submodels
This is the main change of this PR. The initial requirements that lead to this are:
These requirements fit well in an object oriented design where a base model (
Node
) provides the base graph structure as well as other metadata attributes (dates, ownership, etc.), and subclasses of this model provide the specific attributes that are relevant to each object type.A way to implement this so that we can carry the object oriented nature throughout the application is to define all the subclass-specific attributes inside a "data" field, which will be a dict with arbitrary data from the point of view of the base class
Node
but will be defined with specific fields in each concrete subclass model.In this scheme,
Node
is almost like an abstract class (we can create plain Nodes, but the contents of thedata
field won't be validated), and the submodels will be the concrete implementation subclasses that will be fully validated depending on their type, which is encoded in thekind
field.This way we can effectively implement dynamic polymorphism in data storage and retrieval. The lower layers (DB) aren't aware of different
Node
types, as they all share the same structure with the only differences being the contents of the "data" dict. All nodes are thus retrieved as plainNode
objects, which are then interpreted and validated as concretetypes by the API.
New models
Checkout
The specific data of a checkout will be the
kernel_revision
.Kbuild
The specific data of a kernel build will be the
kernel_revision
,arch
,defconfig
,compiler
and list of extra configurationfragments
.NOTE: if a kernel build node will always hang from a checkout node, then the kernel_revision can be retrieved from its parent node.
Test
The specific data of a test will be the
kernel_revision
and the information about the test: source/repo and revision.NOTE: if a test node will always hang from a kbuild node, then the kernel_revision can be retrieved from its parent node.
Regression
The specific data of a regression will be the node that introduced the failure and the previous one that passed, ie. a regression is defined by a breaking point between nodes.
Endpoint changes
The endpoints no longer map a node
kind
to a DB collection. All concreteNode
objects are saved asNode
s. The difference lies in the implementation of the operations that create nodes, which now perform object validation explicitly.This means that we don't need a different endpoint for each node type. This leads to a simpler and cleaner implementation that takes advantage of the object-oriented design where possible. The drawback is that each node POST/PUT is now validated twice: one as a Node and other as a concrete
Node
subtype. I don't know if this could have a noticeable performance impact but, IMO, performance doesn't seem to be a critical key aspect of the system (we're using python for everything, after all), and the pros of this implementation far outweigh the cons.The "regression" endpoints are removed since all Node subtypes can now work through the "node" endpoints.