Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: support for exporting environment variables with parallel launchers #3207

Open
casparvl opened this issue May 29, 2024 · 1 comment

Comments

@casparvl
Copy link

casparvl commented May 29, 2024

Some software requires environment variables to run - e.g. PyTorch's distributed framework requires MASTER_PORT (among others) to be set. As discussed on Slack, this is currently challenging if the test developer doesn't know the configured launcher in advance.

I.e. if we know that OpenMPI's mpirun will be the launcher, we can do

self.job.launcher.options = ['-x MASTER_PORT']

But if we are writing a test with the purpose of it being reused (e.g. a test for the hpctestlib), it would be nice to have a way of specifying this in a launcher-agnostic way. E.g.

test.env_vars['MASTER_PORT'] = '1234'
self.job.launcher.export_var = ['MASTER_PORT']

or

self.job.launcher.export_var['MASTER_PORT] = ['1234']

(the 2nd is probably more convenient, but not sure which API is easiest to support from the ReFrame side).

ReFrame would then abstract how each particular launcher exports environment variables. E.g. for OpenMPI, the ReFrame backend would add -x MASTER_PORT=1234 as extra launcher argument, whereas for srun it would add --export=MASTERPORT=1234.

Note that right now, I worked around this issue by making a wrapper shell script that sets the environment variables, similar to what is used here by CSCS in their PyTorch test.

@casparvl casparvl changed the title Feature request: support for exporting environment variables Feature request: support for exporting environment variables with parallel launchers May 29, 2024
@vkarak vkarak moved this to Todo in ReFrame Backlog Jun 3, 2024
@vkarak
Copy link
Contributor

vkarak commented Jun 3, 2024

I think that

self.job.launcher.env_vars = {'MASTER_PORT': '1234'}

or

self.job.launcher.env_vars['MASTER_PORT] = '1234'

is the best and matches the test's env_vars in the syntax.

@vkarak vkarak added this to the ReFrame 4.8 milestone Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants