Adding SLURM-compatibility to Benchexec #995

leventeBajczi · 2024-02-18T20:58:52Z

I've added experimental support for the SLURM workload manager for HPC as a new contrib/ entry. I have no idea whether this contribution is interesting for other users of Benchexec, but for Hungarian researchers, we can request access to Komondor (https://hpc.kifu.hu/hu/komondor) for research projects, and it uses SLURM as a way of scheduling jobs across its nodes.

As an overview of what I've implemented (I based these modifications on the AWS integration):

A new benchmarking entrypoint contrib/slurm-benchmark.py, which enables SLURM-compatibility via the flag --slurm
A new executor contrib/slurm/slurmexecutor.py, which takes a benchmark and runs it using srun, possibly in a singularity container (for dependency management).

Currently, most of the stuff works (besides keeping resulting files except for the log, but that is not that hard to implement from here), but I feel like it is very easy for the integration to break. I'm wondering whether it is worth putting in some extra effort to make this more production-ready. I'm eagerly waiting for some feedback on this; I'd be happy to work on it further if there is interest.

PhilippWendler · 2024-02-19T10:51:07Z

I am pretty sure that there are also other people who would like to use it, so this is welcome!

I know about Slurm, but do not have an installation here, so I can't test or maintain this. But we have the contrib/ directory precisely for sharing such additional components outside of the supported Benchxec, and I would like to merge this here. Ideally you would be willing to somewhat maintain in the future?

The general architecture of this PR matches what we do in similar cases and is certainly what I would have recommended.

So besides what you think should be added, I would focus mostly on some documentation.

Can you explain a little bit more on the effect of using Slurm for users who are familar with BenchExec? I assume, for example, that stuff like core allocations and measurements are handled purely by Slurm, so users wouldn't get what they know from BenchExec, but instead what Slurm provides?

Are there any particular requirements beyond a standard Slurm installation or do users need to do something specific?

Ideally these questions together with an overview of supported/unsupported features of BenchExec would be discussed in a readme.

leventeBajczi · 2024-02-19T12:18:00Z

Sure, I'd be happy to maintain it! I'll prioritize writing a documentation first.

Everything resource-related is handled by SLURM: both the consraints as well as the measurement. SLURM seems to use cgroups to keep an account of resource usage so hopefully it's reliable, but I do have some problems right now with the resolution of these paramteres - for example, time limits are handled to the nearest minute on the Komondor instance. Of course, benchexec will flag them as timeouts (CPU time > time limit), but this represents quite a big overhead of additional computation.

Besides testing them with our own tool (Theta), do you have any suggestions on what/how to test?

PhilippWendler · 2024-02-19T14:40:42Z

I don't have any particular testing suggestions. If you want to test more tools, you could relatively easily download the SV-COMP participants and try them? All data required for this is online, but I wouldn't require this as condition for merging this PR.

contrib/slurm/README.md

contrib/slurm/slurmexecutor.py

leventeBajczi · 2024-02-20T10:27:42Z

I think I addressed all of your comments, thanks for the thorough review! I ran some tests just now to see if I've broken anything, and everything seems to work.
~~I've left two of the threads unresolved where I still need some input from you, but otherwise, I think everything seems to be in order.~~ GitHub failed to load your answers, I can see them now.

leventeBajczi · 2024-02-20T10:48:28Z

I've added some tweaks, ran some tests, and everything seems to be in order. If you agree, please go ahead with the merge. Thanks!

contrib/slurm/slurmexecutor.py

leventeBajczi · 2024-02-20T11:33:46Z

I've reworked the commands to not use shells at all, please re-check. Hopefully this will be more robust.
Also, I checked, and singularity, similarly to docker and other containerization tools, will handle all params after the name of the container as params to pass to the container. So singularity will work well. I'm unsure about non-singularity runs, but in my testing, it seems to work just fine, and the command receives the params that overlap with that of srun.

PhilippWendler

Thanks! The argument handling should be safe now, which is a great improvement.

Last round of minor stuff, and then only the question about the desired Python compatibility is left.

contrib/slurm/slurmexecutor.py

PhilippWendler · 2024-02-20T12:38:37Z

Thanks again for the submission, and in particular for your responsiveness and fast integration of suggestions!

leventeBajczi added 26 commits February 17, 2024 15:21

Added slurm executor

49f52ed

Fixed memory calculation

687cdb4

Fixed memory calculation #2

a53f6d6

Implemented slurm executor

7b11e4b

Fixed timelimit

9577b1c

Fixed subprocess

2496c19

Fixed stdout

be275ea

Fixed stdout

3f72aa8

Added logging

f2f385b

Fixed memory limit

754ab98

Adding 6 lines of metadata to beginning of file

4f0f006

Reformatted file

ea9b778

Cleared up commands

4d5af97

Formatted file

e7cdee6

Moved --slurm out of main benchexec code

6a7dbfb

Fixed path

00ed378

Fixed formatting command

4e837d3

Using --no-home instead of specifying -B

f78ff8f

Added --contain

a0c662d

Added -B $PWD:$HOME

394a7c4

Added log

2ea64b9

ntasks=1

82d960a

removed unused import

26d4472

Added fusemount options

18ffb45

Added temp files

c88e5dc

Reformatted

1fbd3e4

Updated copyright, added readme

0d7246b

Added extra logging

8ef0505

PhilippWendler reviewed Feb 20, 2024

View reviewed changes

contrib/slurm/README.md Outdated Show resolved Hide resolved

Fixed bug in exit code parsing

28da4ac

PhilippWendler reviewed Feb 20, 2024

View reviewed changes

contrib/slurm/slurmexecutor.py Outdated Show resolved Hide resolved

leventeBajczi added 3 commits February 20, 2024 11:23

Fixed quoting issue

86574f2

Minor fixes: typo, logging

46247d7

Formatting fix

e6b108e

leventeBajczi added 4 commits February 20, 2024 11:36

Updated README and requering --no-hyperthreading

f110bcc

Added exception to when exit code is not parsed

2e6e2d9

Added new line to limitations

bb9f5bf

Formatting fix

9b86b5f

PhilippWendler reviewed Feb 20, 2024

View reviewed changes

leventeBajczi added 6 commits February 20, 2024 12:05

Minor fixes from feedback

d742e30

Formatted file

ae7ebcc

Not using shell anymore

255eb6d

str() wrap around ints

4368b34

Added comma

e07ce23

str() wrap around ints

24dd37e

PhilippWendler reviewed Feb 20, 2024

View reviewed changes

contrib/slurm/slurmexecutor.py Show resolved Hide resolved

contrib/slurm/slurmexecutor.py Outdated Show resolved Hide resolved

contrib/slurm/slurmexecutor.py Outdated Show resolved Hide resolved

contrib/slurm/slurmexecutor.py Outdated Show resolved Hide resolved

leventeBajczi added 3 commits February 20, 2024 13:07

Implemented some fixes

1563694

Formatted file

d9fba5d

better handling of bad scratchdir

e6e9b92

PhilippWendler approved these changes Feb 20, 2024

View reviewed changes

PhilippWendler merged commit 71c766d into sosy-lab:main Feb 20, 2024
3 of 4 checks passed

leventeBajczi mentioned this pull request Feb 20, 2024

Remove support for Python 3.8 and 3.9 #986

Open

leventeBajczi deleted the slurm branch February 20, 2024 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding SLURM-compatibility to Benchexec #995

Adding SLURM-compatibility to Benchexec #995

leventeBajczi commented Feb 18, 2024

PhilippWendler commented Feb 19, 2024

leventeBajczi commented Feb 19, 2024

PhilippWendler commented Feb 19, 2024

leventeBajczi commented Feb 20, 2024 •

edited

Loading

leventeBajczi commented Feb 20, 2024

leventeBajczi commented Feb 20, 2024

PhilippWendler left a comment

PhilippWendler commented Feb 20, 2024

Adding SLURM-compatibility to Benchexec #995

Adding SLURM-compatibility to Benchexec #995

Conversation

leventeBajczi commented Feb 18, 2024

PhilippWendler commented Feb 19, 2024

leventeBajczi commented Feb 19, 2024

PhilippWendler commented Feb 19, 2024

leventeBajczi commented Feb 20, 2024 • edited Loading

leventeBajczi commented Feb 20, 2024

leventeBajczi commented Feb 20, 2024

PhilippWendler left a comment

Choose a reason for hiding this comment

PhilippWendler commented Feb 20, 2024

leventeBajczi commented Feb 20, 2024 •

edited

Loading