listing available gpus #3

dougbevan · 2018-11-19T20:28:05Z

A very useful software. How can we list the available vs used GRES for gpus?

For instance, if I do:

pestat -G

This is partially good, as I can see the GRES being used. But it doesn't show the GRES available.

For CPUs, you get to see used/total (in my case 0/48). How can I get a similar output for gpus?

OleHolmNielsen · 2018-11-20T07:56:33Z

I believe the sinfo command will give you the desired information about GRES in nodes.
For example, use this command:

sinfo -o "%P %G %D %N"

dougbevan · 2018-11-20T14:28:25Z

That does give the total GPUs. It would be amazing to have output like this in pestat though, which give a number of useful metrics all in one output and give a great "quick glance" for our users.

With pestat -G we get a great output for cpus like:

Use/Tot
0 48

It would be useful to also see something like:

GRES GPUs
Use / Tot
2 8

OleHolmNielsen · 2018-11-21T11:04:22Z

I understand now, so I've added a new column GRES/node which is printed if you select the -G flag.
Can you try out the new script and tell me if this does what you want?

dougbevan · 2018-11-21T13:21:04Z

This is excellent. I tried it on one of our single node systems, and I see the available gpu and the GRES/job. Thanks for the addition -- this will be quite useful.

OleHolmNielsen · 2018-11-21T13:28:05Z

I'm glad this works for you! Please report any issues back to me.

cheekykite · 2021-02-23T05:40:37Z

Hello.

Thanks for providing a good tool.

"GRES/job" is not showing up in a clustered environment.
Can I get an opinion?

master:pestat]#
master:pestat]# ./pestat  -G
GRES (Generic Resource) is printed after each jobid
Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  GRES/   Joblist
                            State Use/Tot              (MB)     (MB)  node    JobId User GRES/job ...
      n1        titanxp*     idle   0   6    0.07     60000    62869  gpu:TitanXP:2
      n2        titanxp*     idle   0   6    0.01     60000    62952  gpu:TitanXP:2
      n3        titanxp*     idle   0   6    0.01     60000    62860  gpu:TitanXP:2
      n4        titanxp*     idle   0   6    0.01     60000    62891  gpu:TitanXP:2
      n5        titanxp*     idle   0   6    0.01     60000    62971  gpu:TitanXP:2
      n6        titanxp*     idle   0   6    0.09     60000    62945  gpu:TitanXP:2
      n7        titanxp*     idle   0   6    0.01     60000    63096  gpu:TitanXP:2
      n8        titanxp*     idle   0   6    0.02     60000    63084  gpu:TitanXP:2
      n9        titanxp*      mix   4   6    2.39*    60000    49649  gpu:TitanXP:2 2367 sonic  2360 sonic
     n10        titanxp*     idle   0   6    0.01     60000    63082  gpu:TitanXP:2
master:pestat]#
master:pestat]#
master:pestat]# sinfo --version
slurm 18.08.8
master:pestat]#
master:pestat]# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)
master:pestat]#
master:pestat]#

OleHolmNielsen · 2021-02-23T09:01:13Z

You're running an old and obsolete version of Slurm. Later versions have significantly improved GPU support, so maybe that's why you don't get the expected information.

The pestat command obtains information from Slurm with:
sinfo -h -N $partition $hostlist $statelist -o "%N %P %C %O %m %e %t %Z %G"
where the %G option prints:
%G Generic resources (gres) associated with the nodes.
Please check "man sinfo" in your Slurm version to see if %G exists.

OleHolmNielsen · 2021-03-30T09:16:25Z

Can you please test the latest version of pestat? The GRES/job is now being printed correctly.

clue2 · 2022-08-17T04:20:38Z

It might be helpful to change the formatting from -o to -O to make use of the extra formatting options (such as GresUsed)

sinfo -h -N $partition $hostlist $statelist -o "%N %P %C %O %m %e %t %Z %G"

becomes:

sinfo -h -N $partition $hostlist $statelist -O "Nodes,Partition,CPUsState,CPUsLoad,Memory,FreeMem,StateCompact,Threads,Gres"

You could then add in GresUsed ( ideally cleaning it up a bit ) to achieve a more helpful overview of how many GPUs are in use/available on a node

yzs981130 · 2022-08-17T11:48:20Z

change the formatting from -o to -O

Slurm_tools/pestat/pestat

Line 340 in 21ef8a6

    
           $prefix/sinfo -h -N $all_partitions $partition $hostlist $statelist -O "NodeList:30,Partition:30,CPUsState:30,CPUsLoad:30,Memory:30,FreeMem:30,StateCompact:30,Threads:30,Gres:30" | $my_awk '

I believe pestat has already used -O to retrieve information.

I am experiencing the same problem with you, to add a node-level GresUsed in the output of pestat. Therefore, I added it in my personal fork: yzs981130@7e711af. Hope it can help you!

cc @OleHolmNielsen What do you think about the node-level GresUsed? Since it is my first time using awk, I could send a draft pr if you think it is also needed.

clue2 · 2022-08-22T01:02:54Z

You're right, it does use -O now - I hadn't actually checked the code & was just going by the comments above. Thanks!

In that case, just changing Gres to GresUsed does a good enough job

In your fork the formatting has become a bit off for me:

OleHolmNielsen · 2022-08-26T12:20:55Z

Thank you for your suggestion.
The GRES output shows how many GPUs are physically in the node.

With "pestat -G" the GRES used by each job on the node is printed. One could count manually how many GPUs are used.

I agree that the "sinfo -O GRESUSED" gives a useful summary of how many GPUs are in use.

However, I think that printing both GRES and GRESUSED data makes the output very long and difficult to read.

Maybe one could think of simplifying by having a "Num_GPU" column with simply the "Use/Tot" numbers. Some complicated parsing of GRES and GRESUSED would be needed.

There could be non-GPU types of GRES, see https://slurm.schedmd.com/gres.conf.html

Do you have suggestions for making the output of pestat more useful and simple to read?

OleHolmNielsen · 2022-08-26T12:38:55Z

Note added: Sites have to define their own GRES types in slurm.conf using the GresTypes parameter.
It can become complex for "pestat" to decode all possible GRES types and extract numbers for "Use/Tot".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

listing available gpus #3

listing available gpus #3

dougbevan commented Nov 19, 2018

OleHolmNielsen commented Nov 20, 2018

dougbevan commented Nov 20, 2018

OleHolmNielsen commented Nov 21, 2018

dougbevan commented Nov 21, 2018

OleHolmNielsen commented Nov 21, 2018

cheekykite commented Feb 23, 2021

OleHolmNielsen commented Feb 23, 2021

OleHolmNielsen commented Mar 30, 2021

clue2 commented Aug 17, 2022

yzs981130 commented Aug 17, 2022

clue2 commented Aug 22, 2022

OleHolmNielsen commented Aug 26, 2022

OleHolmNielsen commented Aug 26, 2022

listing available gpus #3

listing available gpus #3

Comments

dougbevan commented Nov 19, 2018

OleHolmNielsen commented Nov 20, 2018

dougbevan commented Nov 20, 2018

OleHolmNielsen commented Nov 21, 2018

dougbevan commented Nov 21, 2018

OleHolmNielsen commented Nov 21, 2018

cheekykite commented Feb 23, 2021

OleHolmNielsen commented Feb 23, 2021

OleHolmNielsen commented Mar 30, 2021

clue2 commented Aug 17, 2022

yzs981130 commented Aug 17, 2022

clue2 commented Aug 22, 2022

OleHolmNielsen commented Aug 26, 2022

OleHolmNielsen commented Aug 26, 2022