Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout service failing to update nodes #315

Closed
gctucker opened this issue Sep 1, 2023 · 6 comments
Closed

Timeout service failing to update nodes #315

gctucker opened this issue Sep 1, 2023 · 6 comments

Comments

@gctucker
Copy link
Contributor

gctucker commented Sep 1, 2023

After running the pipeline for a while without the timeout services enabled, when starting it there are lots of errors like this:

09/01/2023 11:32:46 AM UTC [DEBUG] 64bfa6baa739c92a578d8d7e TIMEOUT
09/01/2023 11:32:46 AM UTC [ERROR] Unauthorized to complete the operation

The only error from the API logs is:

PUT /latest/node/64bfa6baa739c92a578d8d7e HTTP/1.1" 401 Unauthorized

It would probably be good to print any detail error message returned by the API in the timeout service to know what is going on.

That particular node:

$ ./kci node get 64bfa6baa739c92a578d8d7e
{
    "id": "64bfa6baa739c92a578d8d7e",
    "kind": "node",
    "name": "checkout",
    "path": [
        "checkout"
    ],
    "group": null,
    "revision": {
        "tree": "kernelci",
        "url": "https://github.com/kernelci/linux.git",
        "branch": "staging-mainline",
        "commit": "d7ca2cbbc144b632481dd023a9d925d677cbf6b3",
        "describe": null,
        "version": null
    },
    "parent": null,
    "state": "running",
    "result": null,
    "artifacts": null,
    "data": null,
    "created": "2023-07-25T10:40:58.838000",
    "updated": "2023-07-25T10:40:58.838000",
    "timeout": "2023-07-25T11:40:58.773000",
    "holdoff": null,
    "owner": "admin",
    "user_groups": []
}
@gctucker
Copy link
Contributor Author

gctucker commented Sep 1, 2023

Ah I think that may be because the node is owned by the admin user but the user running the timeout service is kernelci-pipeline. So we probably need a specific user for the timeout services and other system-wide operations done on the client side.

@gctucker
Copy link
Contributor Author

gctucker commented Sep 1, 2023

One way to solve this is to only update the nodes that belong to the user running the timeout service. Another approach is to run with a user authorised to update all the nodes, but just using an admin token would be a bit overkill as that also allows managing users etc. For now I think it's fine to use an admin token but something cleaner will be needed eventually in production.

@gctucker
Copy link
Contributor Author

gctucker commented Sep 1, 2023

Alright I've set it up with an admin API token and the node mentioned above was transitioned to timeout. However, I now have this issue with other nodes:

09/01/2023 11:55:18 AM UTC [DEBUG] 64ed914bc9c7d64620957456 TIMEOUT
09/01/2023 11:55:18 AM UTC [ERROR] Unauthorized to complete the operation
{
    "id": "64ed914bc9c7d64620957456",
    "kind": "node",
    "name": "checkout",
    "path": [
        "checkout"
    ],
    "group": null,
    "revision": {
        "tree": "kernelci",
        "url": "https://github.com/kernelci/linux.git",
        "branch": "staging-mainline",
        "commit": "e80086b5c4f582f217ae8fcecd2082c784cc07d6",
        "describe": "staging-mainline-20230829.0",
        "version": {
            "version": 6,
            "patchlevel": 5,
            "sublevel": null,
            "extra": "-1208-ge80086b5c4f5",
            "name": null
        }
    },
    "parent": null,
    "state": "closing",
    "result": null,
    "artifacts": {
        "tarball": "https://kciapistagingstorage1.file.core.windows.net/staging/linux-kernelci-staging-mainline-staging-mainline-20230829.0.tar.gz?sv=2022-11-02&ss=bfqt&srt=sco&sp=r&se=2123-07-20T22:00:00Z&st=2023-07-21T18:27:25Z&spr=https&sig=TDt3NorDXylmyUtBQnP1S5BZ3uywR06htEGTG%2BSxLWg%3D"
    },
    "data": null,
    "created": "2023-08-29T06:33:47.807000",
    "updated": "2023-08-29T06:48:47.727000",
    "timeout": "2023-08-29T07:33:47.727000",
    "holdoff": "2023-08-29T06:48:18.569000",
    "owner": "staging.kernelci.org",
    "user_groups": []
}

I would have expected the admin API token to be authorized to update all the nodes. I think the admin magic group name is not a very good approach, there should be a special admin flag in the schema for this instead for groups and users.

@gctucker
Copy link
Contributor Author

gctucker commented Sep 1, 2023

See #316 to work around this issue by only dealing with nodes owned by the current user.

@JenySadadia
Copy link
Collaborator

I believe kernelci/kernelci-api#423 will fix this issue.

@JenySadadia
Copy link
Collaborator

This issue has not been observed after merging kernelci/kernelci-api#423.
Hence, closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants