Rework models #395

r-c-n · 2023-11-03T13:57:23Z

NOTE: These changes will break an API instance that is connected to a DB with existing nodes, since this changes the node models. The db migration script included in the changes handles the data migration cleanly (hopefully).

Related PRs:

kernelci/kernelci-core#2224 and this PR are interdependent, so they should be merged together. I suggest kernelci/kernelci-core#2224 is merged first, then this one and then kernelci/kernelci-pipeline#365. Remember to apply the migration scripts.

This PR has already been tested with PR applied, with all tests passing.

Definition of Node and Node submodels

This is the main change of this PR. The initial requirements that lead to this are:

We need objects to be properly defined and modeled, with pydantic offering automatic validation of their fields.
We want to maintain the tree (or graph) structure of the nodes but adding specific object types depending on the entities they model.

These requirements fit well in an object oriented design where a base model (Node) provides the base graph structure as well as other metadata attributes (dates, ownership, etc.), and subclasses of this model provide the specific attributes that are relevant to each object type.

A way to implement this so that we can carry the object oriented nature throughout the application is to define all the subclass-specific attributes inside a "data" field, which will be a dict with arbitrary data from the point of view of the base class Node but will be defined with specific fields in each concrete subclass model.

In this scheme, Node is almost like an abstract class (we can create plain Nodes, but the contents of the data field won't be validated), and the submodels will be the concrete implementation subclasses that will be fully validated depending on their type, which is encoded in the kind field.

This way we can effectively implement dynamic polymorphism in data storage and retrieval. The lower layers (DB) aren't aware of different Node types, as they all share the same structure with the only differences being the contents of the "data" dict. All nodes are thus retrieved as plain Node objects, which are then interpreted and validated as concrete
types by the API.

New models

Checkout

The specific data of a checkout will be the kernel_revision.

Kbuild

The specific data of a kernel build will be the kernel_revision, arch, defconfig, compiler and list of extra configuration fragments.

NOTE: if a kernel build node will always hang from a checkout node, then the kernel_revision can be retrieved from its parent node.

Test

The specific data of a test will be the kernel_revision and the information about the test: source/repo and revision.

NOTE: if a test node will always hang from a kbuild node, then the kernel_revision can be retrieved from its parent node.

Regression

The specific data of a regression will be the node that introduced the failure and the previous one that passed, ie. a regression is defined by a breaking point between nodes.

Endpoint changes

The endpoints no longer map a node kind to a DB collection. All concrete Node objects are saved as Nodes. The difference lies in the implementation of the operations that create nodes, which now perform object validation explicitly.

This means that we don't need a different endpoint for each node type. This leads to a simpler and cleaner implementation that takes advantage of the object-oriented design where possible. The drawback is that each node POST/PUT is now validated twice: one as a Node and other as a concrete Node subtype. I don't know if this could have a noticeable performance impact but, IMO, performance doesn't seem to be a critical key aspect of the system (we're using python for everything, after all), and the pros of this implementation far outweigh the cons.

The "regression" endpoints are removed since all Node subtypes can now work through the "node" endpoints.

r-c-n · 2023-11-03T14:54:09Z

@gtucker despite the pending tasks before getting this ready for merging, I think it's worth it to have a look at it and discuss the possible drawbacks. I think this is a good solution and fixes the needs that we had regarding model types, validation and endpoint ergonomics.

To do:

Review / update "/nodes" PUT, possible changes in DB helper functions
Update unit tests
Proceed to update kernelci-pipeline and kernelci-core to sync with these changes

nuclearcat · 2023-11-04T02:06:58Z

In some cases we might have tests daisy chained, so test might trigger other test. So i believe we need to include all kernel information in test "data" field, this will be easier, so test wont need to access parent nodes.
I think for sake of performance it is preferable(if that possible) that each node on event have enough information to execute job, without doing additional lookups.

r-c-n · 2023-11-04T08:36:54Z

@nuclearcat sure, we can do that. Ultimately, it's a matter of finding the right balance between query performance (embedding node data into other nodes) and DB size (saving node pointers instead). Also, in cases where the pointed data is prone to change I don't think we should embed it into chained nodes. But in this case in particular the data should be stable, so it'll work. I'll make the changes. Thanks for reviewing!

r-c-n · 2023-11-08T15:13:28Z

Review / update "/nodes" PUT, possible changes in DB helper functions: DONE
Update unit tests and e2e tests: DONE

Node objects no longer contain a 'revision' field. Signed-off-by: Ricardo Cañuelo <[email protected]>

Implement a mechanism for dynamic polymorphism on Node objects by using explicit pydantic object validation depending on the node kind and storing all Node objects as plain nodes in the same DB collection regardless of their type. Signed-off-by: Ricardo Cañuelo <[email protected]>

Signed-off-by: Ricardo Cañuelo <[email protected]>

Perform node validation for the /nodes PUT endpoint. Signed-off-by: Ricardo Cañuelo <[email protected]>

Sync test_create_node_endpoint to the latest model changes. Signed-off-by: Ricardo Cañuelo <[email protected]>

Sync test_get_nodes_by_attributes_endpoint to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

Sync test_get_node_by_id_endpoint to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

Sync test_get_all_nodes to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

Sync regression e2e tests to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

Signed-off-by: Ricardo Cañuelo <[email protected]>

Sync DB to model changes after these commits: api.models: basic definitions of Node submodels api.main: use node endpoints for all type of Node subtypes api.db: remove regression collection Signed-off-by: Ricardo Cañuelo <[email protected]>

nuclearcat

Tested manually on staging and followed all logs to make sure all works

r-c-n force-pushed the rework-models branch 3 times, most recently from 8952151 to 5e68739 Compare November 3, 2023 14:30

r-c-n requested review from gctucker, nuclearcat and JenySadadia November 3, 2023 14:49

r-c-n added the staging-skip Don't test automatically on staging.kernelci.org label Nov 6, 2023

r-c-n force-pushed the rework-models branch 2 times, most recently from 72c004d to 8781bb1 Compare November 8, 2023 15:08

gctucker force-pushed the main branch 2 times, most recently from 97cb0df to 2ff0faf Compare November 10, 2023 12:48

r-c-n force-pushed the rework-models branch from 2a8b75b to 11629a4 Compare December 5, 2023 15:23

This was referenced Dec 5, 2023

Changes to sync to the new node models kernelci/kernelci-pipeline#365

Merged

Changes to sync to the new node models kernelci/kernelci-core#2224

Merged

r-c-n marked this pull request as ready for review December 5, 2023 16:33

gctucker removed their request for review December 5, 2023 19:00

r-c-n force-pushed the rework-models branch from 11629a4 to 9fe1166 Compare December 15, 2023 11:28

Ricardo Cañuelo added 8 commits January 10, 2024 11:43

api.main: sync to new Node model changes

313d92a

Node objects no longer contain a 'revision' field. Signed-off-by: Ricardo Cañuelo <[email protected]>

api.db: remove regression collection

cd76375

Signed-off-by: Ricardo Cañuelo <[email protected]>

api.db: validate node models in _create_recursively

74650c3

Perform node validation for the /nodes PUT endpoint. Signed-off-by: Ricardo Cañuelo <[email protected]>

tests/unit_tests: fix test_create_node_endpoint

75d5e0e

Sync test_create_node_endpoint to the latest model changes. Signed-off-by: Ricardo Cañuelo <[email protected]>

tests/unit_tests: fix test_get_nodes_by_attributes_endpoint

5f31059

Sync test_get_nodes_by_attributes_endpoint to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

tests/unit_tests: fix test_get_node_by_id_endpoint

d0f4c5a

Sync test_get_node_by_id_endpoint to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

tests/unit_tests: fix test_get_all_nodes

d43b601

Sync test_get_all_nodes to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

r-c-n force-pushed the rework-models branch from 8792522 to f9042d6 Compare January 10, 2024 12:26

Ricardo Cañuelo added 3 commits January 10, 2024 16:33

tests/e2e_tests: fix regression e2e tests

e099bc4

Sync regression e2e tests to the latest model changes Signed-off-by: Ricardo Cañuelo <[email protected]>

api.main: return additional Node fields in _get_node_event_data

60e426d

Signed-off-by: Ricardo Cañuelo <[email protected]>

tests/e2e_tests: sync pipeline tests to _get_node_event_data

de5ab1b

Signed-off-by: Ricardo Cañuelo <[email protected]>

r-c-n force-pushed the rework-models branch 2 times, most recently from fc49c30 to 449535f Compare January 10, 2024 15:52

r-c-n removed the staging-skip Don't test automatically on staging.kernelci.org label Jan 11, 2024

r-c-n force-pushed the rework-models branch from 449535f to 6ba0677 Compare January 11, 2024 09:37

r-c-n force-pushed the rework-models branch 2 times, most recently from 1b08404 to 0bf1a2a Compare January 11, 2024 15:52

nuclearcat approved these changes Jan 11, 2024

View reviewed changes

nuclearcat added this pull request to the merge queue Jan 11, 2024

Merged via the queue into kernelci:main with commit d9176e8 Jan 11, 2024
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework models #395

Rework models #395

r-c-n commented Nov 3, 2023 •

edited

Loading

r-c-n commented Nov 3, 2023

nuclearcat commented Nov 4, 2023

r-c-n commented Nov 4, 2023

r-c-n commented Nov 8, 2023

nuclearcat left a comment

Rework models #395

Rework models #395

Conversation

r-c-n commented Nov 3, 2023 • edited Loading

Definition of Node and Node submodels

New models

Checkout

Kbuild

Test

Regression

Endpoint changes

r-c-n commented Nov 3, 2023

nuclearcat commented Nov 4, 2023

r-c-n commented Nov 4, 2023

r-c-n commented Nov 8, 2023

nuclearcat left a comment

Choose a reason for hiding this comment

r-c-n commented Nov 3, 2023 •

edited

Loading