-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
listing available gpus #3
Comments
I believe the sinfo command will give you the desired information about GRES in nodes. sinfo -o "%P %G %D %N" |
That does give the total GPUs. It would be amazing to have output like this in pestat though, which give a number of useful metrics all in one output and give a great "quick glance" for our users. With pestat -G we get a great output for cpus like: Use/Tot It would be useful to also see something like: GRES GPUs |
I understand now, so I've added a new column GRES/node which is printed if you select the -G flag. |
This is excellent. I tried it on one of our single node systems, and I see the available gpu and the GRES/job. Thanks for the addition -- this will be quite useful. |
I'm glad this works for you! Please report any issues back to me. |
Hello. Thanks for providing a good tool. "GRES/job" is not showing up in a clustered environment. master:pestat]#
master:pestat]# ./pestat -G
GRES (Generic Resource) is printed after each jobid
Hostname Partition Node Num_CPU CPUload Memsize Freemem GRES/ Joblist
State Use/Tot (MB) (MB) node JobId User GRES/job ...
n1 titanxp* idle 0 6 0.07 60000 62869 gpu:TitanXP:2
n2 titanxp* idle 0 6 0.01 60000 62952 gpu:TitanXP:2
n3 titanxp* idle 0 6 0.01 60000 62860 gpu:TitanXP:2
n4 titanxp* idle 0 6 0.01 60000 62891 gpu:TitanXP:2
n5 titanxp* idle 0 6 0.01 60000 62971 gpu:TitanXP:2
n6 titanxp* idle 0 6 0.09 60000 62945 gpu:TitanXP:2
n7 titanxp* idle 0 6 0.01 60000 63096 gpu:TitanXP:2
n8 titanxp* idle 0 6 0.02 60000 63084 gpu:TitanXP:2
n9 titanxp* mix 4 6 2.39* 60000 49649 gpu:TitanXP:2 2367 sonic 2360 sonic
n10 titanxp* idle 0 6 0.01 60000 63082 gpu:TitanXP:2
master:pestat]#
master:pestat]#
master:pestat]# sinfo --version
slurm 18.08.8
master:pestat]#
master:pestat]# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)
master:pestat]#
master:pestat]# |
You're running an old and obsolete version of Slurm. Later versions have significantly improved GPU support, so maybe that's why you don't get the expected information. The pestat command obtains information from Slurm with: |
Can you please test the latest version of pestat? The GRES/job is now being printed correctly. |
Line 340 in 21ef8a6
I believe I am experiencing the same problem with you, to add a node-level cc @OleHolmNielsen What do you think about the node-level |
Thank you for your suggestion. With "pestat -G" the GRES used by each job on the node is printed. One could count manually how many GPUs are used. I agree that the "sinfo -O GRESUSED" gives a useful summary of how many GPUs are in use. However, I think that printing both GRES and GRESUSED data makes the output very long and difficult to read. Maybe one could think of simplifying by having a "Num_GPU" column with simply the "Use/Tot" numbers. Some complicated parsing of GRES and GRESUSED would be needed. There could be non-GPU types of GRES, see https://slurm.schedmd.com/gres.conf.html Do you have suggestions for making the output of pestat more useful and simple to read? |
Note added: Sites have to define their own GRES types in slurm.conf using the GresTypes parameter. |
A very useful software. How can we list the available vs used GRES for gpus?
For instance, if I do:
pestat -G
This is partially good, as I can see the GRES being used. But it doesn't show the GRES available.
For CPUs, you get to see used/total (in my case 0/48). How can I get a similar output for gpus?
The text was updated successfully, but these errors were encountered: