-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use mpirun on glogin/blogin (SLURM) #1208
Comments
Hi @joakimkjellsson, did you try setting I'd need to look more deeply into how to set the actual options. That would need a code change. |
Hi @pgierz time mpirun $(cat hostfile_srun) 2>&1 & so it would use 0-287 ./oifs -e ECE3
288-719 ./oceanx
720-739 ./xios.x
740-740 ./rnfma but I would need it to be -np 288 ./oifs -e ECE3 : -np 432 ./oceanx : -np 20 ./xios.x : -np 1 ./rnfma The function /J |
@joakimkjellsson What branch are you on? I'll start from that one, should be quick enough to program. |
@pgierz no worries. I've already coded it in. My main question was whether someone had already done it or was planning to do it, in which case I would not do it :-) I renamed the old def write_one_hostfile(self, hostfile, config):
"""
Gathers previously prepared requirements
(batch_system.calculate_requirements) and writes them to ``self.path``.
Suitable for mpirun launcher
"""
# make an empty string which we will append commands to
mpirun_options = ""
for model in config["general"]["valid_model_names"]:
end_proc = config[model].get("end_proc", None)
start_proc = config[model].get("start_proc", None)
print(' model ', model)
print(' start_proc ', start_proc)
print(' end_proc ', end_proc)
# a model component like oasis3mct does not need cores
# since its technically a library
# So start_proc and end_proc will be None. Skip it
if start_proc == None or end_proc == None:
continue
# number of cores needed
no_cpus = end_proc - start_proc + 1
print(' no_cpus ',no_cpus)
if "execution_command" in config[model]:
command = "./" + config[model]["execution_command"]
elif "executable" in config[model]:
command = "./" + config[model]["executable"]
else:
continue
# the mpirun command is set here.
mpirun_options += (
" -np %d %s :" % (no_cpus, command)
)
mpirun_options = mpirun_options[:-1] # remove trailing ":"
with open(hostfile, "w") as hostfile:
hostfile.write(mpirun_options) Already made a few test runs and it seems to work. I'll do some more tests. Then it will end up in the /J |
Perfect, thanks for figuring that out. Let us know when you are ready to merge and we can see if we can improve in terms of generalization of the |
I made the change to When Sebastian is back we might do some cleaning etc and then merge this fix branch into Cheers! |
Good afternoon all
glogin
(GWDG Emmy) has undergone some hardware and software upgrades recently. Since the upgrade, I find jobs launched withsrun
are considerably slower than jobs launched withmpirun
. The support team recommendsmpirun
. So I'd like to usempirun
.But I can't work out if ESM-Tools can do it. There is an
mpirun.py
file with a function to write a hostfile formpirun
, but as far as I can see this function is never used. If we use SLURM, then it seems that ESM-Tools will always build ahostfile_srun
and then launch withsrun
.My idea would be to have something like this in
slurm.py
:Line 65 is currently:
but it should be
and then the two functions would be slightly different.
One benefit with
mpirun
would be that heterogeneous parallelisation becomes very easy since we can do:although I'm not sure and would have to double-check exactly how it should be done on
glogin
.Before I venture down this path though, I just want to check: Is it already possible to use
mpirun
but I'm just too dense to figure out how? If not, is someone else already working on a similar solution?Cheers
Joakim
The text was updated successfully, but these errors were encountered: