Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing behavior with calling mpirun in subprocess after import fv3gfs #79

Open
nbren12 opened this issue Jun 17, 2020 · 1 comment
Open

Comments

@nbren12
Copy link
Contributor

nbren12 commented Jun 17, 2020

Apparently, it is not possible to import fv3gfs into a python module and then call mpirun from a subprocess started by that same module. See this minimal example:

>>> import fv3gfs
--------------------------------------------------------------------------
[[38326,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 9600607c6d59

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
>>> import subprocess
>>> subprocess.check_call(['mpirun', '--allow-run-as-root',  '-n', '1', 'echo', '1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpirun', '--allow-run-as-root', '-n', '1', 'echo', '1']' returned non-zero exit status 1.

This caused a very a very confusing test error (ai2cm/fv3net#413). Basically, I was trying to detect if fv3gfs was installed using a try-except block like this:

try:
    import fv3gfs:
except:
    INSTALLED = False

and then skipping the test if fv3gfs is not installed. However, when fv3gfs is installed, the import above broke any attempt to execute the model.

I have a workaround for testing purposes, but this issue seems pretty odd! Any ideas @mcgibbon?

@mcgibbon
Copy link
Collaborator

mcgibbon commented Jun 22, 2020

Yes, the issue is that doing so results in two MPI initialization calls which is not allowed. There are a few different work-around options:

  • If your code needs to run the Fortran model but also execute mpirun in a subprocess, as far as I know MPI does not support this. The initial mpirun command needs to start all controlled processes. You can split off a subset of the ranks to perform a certain task within the script (e.g. using comm.Split).
  • If your code does not require the Fortran model, use fv3util instead of fv3gfs.
  • If your code needs to run the Fortran model, do not use fv3gfs as an optional dependency in the script. You could perform some kind of check before executing the script to select between one which uses fv3gfs-python and one which does not, e.g. in a Makefile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants