Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework models #395

Merged
merged 12 commits into from
Jan 11, 2024
Merged

Rework models #395

merged 12 commits into from
Jan 11, 2024

Conversation

r-c-n
Copy link

@r-c-n r-c-n commented Nov 3, 2023

NOTE: These changes will break an API instance that is connected to a DB with existing nodes, since this changes the node models. The db migration script included in the changes handles the data migration cleanly (hopefully).

Related PRs:

kernelci/kernelci-core#2224 and this PR are interdependent, so they should be merged together. I suggest kernelci/kernelci-core#2224 is merged first, then this one and then kernelci/kernelci-pipeline#365. Remember to apply the migration scripts.

This PR has already been tested with PR applied, with all tests passing.

Definition of Node and Node submodels

This is the main change of this PR. The initial requirements that lead to this are:

  • We need objects to be properly defined and modeled, with pydantic offering automatic validation of their fields.
  • We want to maintain the tree (or graph) structure of the nodes but adding specific object types depending on the entities they model.

These requirements fit well in an object oriented design where a base model (Node) provides the base graph structure as well as other metadata attributes (dates, ownership, etc.), and subclasses of this model provide the specific attributes that are relevant to each object type.

A way to implement this so that we can carry the object oriented nature throughout the application is to define all the subclass-specific attributes inside a "data" field, which will be a dict with arbitrary data from the point of view of the base class Node but will be defined with specific fields in each concrete subclass model.

In this scheme, Node is almost like an abstract class (we can create plain Nodes, but the contents of the data field won't be validated), and the submodels will be the concrete implementation subclasses that will be fully validated depending on their type, which is encoded in the kind field.

This way we can effectively implement dynamic polymorphism in data storage and retrieval. The lower layers (DB) aren't aware of different Node types, as they all share the same structure with the only differences being the contents of the "data" dict. All nodes are thus retrieved as plain Node objects, which are then interpreted and validated as concrete
types by the API.

New models

Checkout

The specific data of a checkout will be the kernel_revision.

Kbuild

The specific data of a kernel build will be the kernel_revision, arch, defconfig, compiler and list of extra configuration fragments.

NOTE: if a kernel build node will always hang from a checkout node, then the kernel_revision can be retrieved from its parent node.

Test

The specific data of a test will be the kernel_revision and the information about the test: source/repo and revision.

NOTE: if a test node will always hang from a kbuild node, then the kernel_revision can be retrieved from its parent node.

Regression

The specific data of a regression will be the node that introduced the failure and the previous one that passed, ie. a regression is defined by a breaking point between nodes.

Endpoint changes

The endpoints no longer map a node kind to a DB collection. All concrete Node objects are saved as Nodes. The difference lies in the implementation of the operations that create nodes, which now perform object validation explicitly.

This means that we don't need a different endpoint for each node type. This leads to a simpler and cleaner implementation that takes advantage of the object-oriented design where possible. The drawback is that each node POST/PUT is now validated twice: one as a Node and other as a concrete Node subtype. I don't know if this could have a noticeable performance impact but, IMO, performance doesn't seem to be a critical key aspect of the system (we're using python for everything, after all), and the pros of this implementation far outweigh the cons.

The "regression" endpoints are removed since all Node subtypes can now work through the "node" endpoints.

@r-c-n r-c-n force-pushed the rework-models branch 3 times, most recently from 8952151 to 5e68739 Compare November 3, 2023 14:30
@r-c-n
Copy link
Author

r-c-n commented Nov 3, 2023

@gtucker despite the pending tasks before getting this ready for merging, I think it's worth it to have a look at it and discuss the possible drawbacks. I think this is a good solution and fixes the needs that we had regarding model types, validation and endpoint ergonomics.

To do:

  • Review / update "/nodes" PUT, possible changes in DB helper functions
  • Update unit tests
  • Proceed to update kernelci-pipeline and kernelci-core to sync with these changes

@nuclearcat
Copy link
Member

image
In some cases we might have tests daisy chained, so test might trigger other test. So i believe we need to include all kernel information in test "data" field, this will be easier, so test wont need to access parent nodes.
I think for sake of performance it is preferable(if that possible) that each node on event have enough information to execute job, without doing additional lookups.

@r-c-n
Copy link
Author

r-c-n commented Nov 4, 2023

@nuclearcat sure, we can do that. Ultimately, it's a matter of finding the right balance between query performance (embedding node data into other nodes) and DB size (saving node pointers instead). Also, in cases where the pointed data is prone to change I don't think we should embed it into chained nodes. But in this case in particular the data should be stable, so it'll work. I'll make the changes. Thanks for reviewing!

@r-c-n r-c-n added the staging-skip Don't test automatically on staging.kernelci.org label Nov 6, 2023
@r-c-n r-c-n force-pushed the rework-models branch 2 times, most recently from 72c004d to 8781bb1 Compare November 8, 2023 15:08
@r-c-n
Copy link
Author

r-c-n commented Nov 8, 2023

  • Review / update "/nodes" PUT, possible changes in DB helper functions: DONE
  • Update unit tests and e2e tests: DONE

Ricardo Cañuelo added 8 commits January 10, 2024 11:43
Node objects no longer contain a 'revision' field.

Signed-off-by: Ricardo Cañuelo <[email protected]>
Implement a mechanism for dynamic polymorphism on Node objects by using
explicit pydantic object validation depending on the node kind and
storing all Node objects as plain nodes in the same DB collection
regardless of their type.

Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Perform node validation for the /nodes PUT endpoint.

Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_create_node_endpoint to the latest model changes.

Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_get_nodes_by_attributes_endpoint to the latest model changes

Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_get_node_by_id_endpoint to the latest model changes

Signed-off-by: Ricardo Cañuelo <[email protected]>
Sync test_get_all_nodes to the latest model changes

Signed-off-by: Ricardo Cañuelo <[email protected]>
Ricardo Cañuelo added 3 commits January 10, 2024 16:33
Sync regression e2e tests to the latest model changes

Signed-off-by: Ricardo Cañuelo <[email protected]>
@r-c-n r-c-n force-pushed the rework-models branch 2 times, most recently from fc49c30 to 449535f Compare January 10, 2024 15:52
@r-c-n r-c-n removed the staging-skip Don't test automatically on staging.kernelci.org label Jan 11, 2024
Sync DB to model changes after these commits:

    api.models: basic definitions of Node submodels
    api.main: use node endpoints for all type of Node subtypes
    api.db: remove regression collection

Signed-off-by: Ricardo Cañuelo <[email protected]>
@r-c-n r-c-n force-pushed the rework-models branch 2 times, most recently from 1b08404 to 0bf1a2a Compare January 11, 2024 15:52
Copy link
Member

@nuclearcat nuclearcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested manually on staging and followed all logs to make sure all works

@nuclearcat nuclearcat added this pull request to the merge queue Jan 11, 2024
Merged via the queue into kernelci:main with commit d9176e8 Jan 11, 2024
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants