-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gpu awareness to queue_info #825
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear if this is going apply to everyone. Of course, it seems apparent to call a GPU tres
.*gpu.*
but I'm sure we'll run into someone who doesn't. I don't know how to handle that case. Maybe we'll need to allow for an environment variable configuration here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SLURM this would apply as there is built-in TRES/GRES plugins that have
gpu
prefix. I have no clue on other schedulers.However the regex might need to be adjusted to avoid matching a non-GPU TRES that has
gpu
in the name. The format for Slurm isgres/gpu:<name>=<number>
but you can also have justgres/gpu=<number>
.Some examples:
So that regex should probably be
%r{gres/gpu(:|=)}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is keys, so maybe
%r{^gres/gpu($|:)}
. This ensures a site had a GRES named likegres/gpu-thing
it wouldn't think that is a GPU job as that GRES might be something different.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Yea we're splitting here for keys and values so for example these 2
get split and extracted into the hash
Yea I think this is what I was worried about, so I can update the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, same. A lot of this stuff will be Slurm only until someone can provide a patch for other schedulers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have another question about the model - is the Slurm plugin guaranteed to have the GPU model in the name as well?
Taking this for example, is every Slurm site guaranteed to list out all the GPY models like this queue having 2
v100-32g
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe that's guaranteed. This is the "type" in the GRES: https://slurm.schedmd.com/gres.conf.html#OPT_Type. It is documented as optional. Even if a site does specify the type, I'm not 100% certain it would show up in TRES unless the site also includes into the accounting: https://slurm.schedmd.com/slurm.conf.html#OPT_AccountingStorageTRES
I'm not 100% certain if accounting TRES configs affect job TRES availability. Either way the type of GPU is optional so that's not guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool thanks for the info.