You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello
Why finishing the switch from ZES_ENABLE_SYSMAN=1 to zesInit() in hwloc, I have to remove some duplicate code that was used in the past when Sysman() could not be enabled. One of them is the query of the PCI properties of the device.
I was notified that PVC gets different PCI maxBandwidth on Aurora from zesDevicePciGetProperties() and zeDevicePciGetPropertiesExt()
(open-mpi/hwloc#595 (comment)). ZE reports 63GB/s as expected.
ZES reports 0.25GB/s instead. The reason could be that one reports the max possible value while the other reports the current (possibly idle) value, but the ZES doc says " The maximum bandwidth in bytes/sec (sum of all lanes)" anyway hence 0.25 doesn't make sense.
I don't have access to Aurora to debug further. I tested on other platforms but they seem to have older releases of the runtime (including on your endeavour cluster), and they just report -1 from ZES anyway (ZE is correct there too).
The text was updated successfully, but these errors were encountered:
Hello
Why finishing the switch from ZES_ENABLE_SYSMAN=1 to zesInit() in hwloc, I have to remove some duplicate code that was used in the past when Sysman() could not be enabled. One of them is the query of the PCI properties of the device.
I was notified that PVC gets different PCI maxBandwidth on Aurora from zesDevicePciGetProperties() and zeDevicePciGetPropertiesExt()
(open-mpi/hwloc#595 (comment)). ZE reports 63GB/s as expected.
ZES reports 0.25GB/s instead. The reason could be that one reports the max possible value while the other reports the current (possibly idle) value, but the ZES doc says " The maximum bandwidth in bytes/sec (sum of all lanes)" anyway hence 0.25 doesn't make sense.
I don't have access to Aurora to debug further. I tested on other platforms but they seem to have older releases of the runtime (including on your endeavour cluster), and they just report -1 from ZES anyway (ZE is correct there too).
The text was updated successfully, but these errors were encountered: