Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RHOAIENG-8816] Automatic discovery of Accelerator profile handles AMD GPUs like they were NVIDIA #3064

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

jpuzz0
Copy link
Contributor

@jpuzz0 jpuzz0 commented Aug 5, 2024

https://issues.redhat.com/browse/RHOAIENG-8816

Description

Only create accelerator when nvidia type nodes are detected from the server's list of nodes.

How Has This Been Tested?

Manually tested. This doesn't seem to be re-creatable in dev without some hard-coded intervention. The condition to create the accelerator has been adjusted to make sure nvidia nodes exist, however I was not able to simulate the "configured" state being true, which is the other part of the condition.

Test steps

Testing this is not straight forward and requires deleting resources in the openshift console and restarting the backend afterwards + manually editing the condition to bypass the "configured" state. I believe this change is one we can have the reporter of the issue test once released to be sure this resolves his issue.

  1. Update
    if (acceleratorDetected.configured && hasNvidiaNodes) {
    locally and remove the acceleratorDetected.configured portion of the condition.
  2. Stop your locally running dev "backend"
  3. Delete console resources:
Pasted Graphic 2 Pasted Graphic 3 5. Start your local dev "backend" and see the accelerator profile created. If no Nvidia nodes exist, then no accelerator should be created.

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@openshift-ci openshift-ci bot requested review from mturley and pnaik1 August 5, 2024 15:03
@jpuzz0 jpuzz0 force-pushed the RHOAIENG-8816 branch 2 times, most recently from 3f02696 to 6132e36 Compare August 8, 2024 14:17
@pnaik1
Copy link
Contributor

pnaik1 commented Aug 9, 2024

@jpuzz0, I was trying to test your pr locally, followed your Test steps, after restarting my backend, I see
migration-gpu-status config automatically created in the openshift console, That means I have nvidia node??
If so how can I test for no nvidia node??

@jpuzz0
Copy link
Contributor Author

jpuzz0 commented Aug 9, 2024

@jpuzz0, I was trying to test your pr locally, followed your Test steps, after restarting my backend, I see migration-gpu-status config automatically created in the openshift console, That means I have nvidia node?? If so how can I test for no nvidia node??

@pnaik1
No, migration-gpu-status will be automatically created in the console regardless. If you want to create different node types I suggest reaching out to #forum-openshift-ai-dashboard.

backend/src/utils/resourceUtils.ts Outdated Show resolved Hide resolved
@Gkrumbach07
Copy link
Member

perfect thank you

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Aug 12, 2024
@Gkrumbach07
Copy link
Member

/approve

Copy link
Contributor

openshift-ci bot commented Aug 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Gkrumbach07

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 807e556 into opendatahub-io:main Aug 12, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants