Confusing behavior with calling `mpirun` in subprocess after import fv3gfs #79

nbren12 · 2020-06-17T23:43:14Z

Apparently, it is not possible to import fv3gfs into a python module and then call mpirun from a subprocess started by that same module. See this minimal example:

>>> import fv3gfs
--------------------------------------------------------------------------
[[38326,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 9600607c6d59

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
>>> import subprocess
>>> subprocess.check_call(['mpirun', '--allow-run-as-root',  '-n', '1', 'echo', '1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpirun', '--allow-run-as-root', '-n', '1', 'echo', '1']' returned non-zero exit status 1.

This caused a very a very confusing test error (ai2cm/fv3net#413). Basically, I was trying to detect if fv3gfs was installed using a try-except block like this:

try:
    import fv3gfs:
except:
    INSTALLED = False

and then skipping the test if fv3gfs is not installed. However, when fv3gfs is installed, the import above broke any attempt to execute the model.

I have a workaround for testing purposes, but this issue seems pretty odd! Any ideas @mcgibbon?

The text was updated successfully, but these errors were encountered:

mcgibbon · 2020-06-22T16:50:31Z

Yes, the issue is that doing so results in two MPI initialization calls which is not allowed. There are a few different work-around options:

If your code needs to run the Fortran model but also execute mpirun in a subprocess, as far as I know MPI does not support this. The initial mpirun command needs to start all controlled processes. You can split off a subset of the ranks to perform a certain task within the script (e.g. using comm.Split).
If your code does not require the Fortran model, use fv3util instead of fv3gfs.
If your code needs to run the Fortran model, do not use fv3gfs as an optional dependency in the script. You could perform some kind of check before executing the script to select between one which uses fv3gfs-python and one which does not, e.g. in a Makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing behavior with calling `mpirun` in subprocess after import fv3gfs #79

Confusing behavior with calling `mpirun` in subprocess after import fv3gfs #79

nbren12 commented Jun 17, 2020

mcgibbon commented Jun 22, 2020 •

edited

Loading

Confusing behavior with calling mpirun in subprocess after import fv3gfs #79

Confusing behavior with calling mpirun in subprocess after import fv3gfs #79

Comments

nbren12 commented Jun 17, 2020

mcgibbon commented Jun 22, 2020 • edited Loading

Confusing behavior with calling `mpirun` in subprocess after import fv3gfs #79

Confusing behavior with calling `mpirun` in subprocess after import fv3gfs #79

mcgibbon commented Jun 22, 2020 •

edited

Loading