Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@with_jobs does not work with @cmd in environments that use jsrun #497

Open
klywang opened this issue Apr 12, 2021 · 1 comment
Open

@with_jobs does not work with @cmd in environments that use jsrun #497

klywang opened this issue Apr 12, 2021 · 1 comment
Labels
bug Something isn't working cluster submission Enhancements to the submission process

Comments

@klywang
Copy link
Contributor

klywang commented Apr 12, 2021

Description

When a job operation is run with @with_jobs and @cmd in an environment which uses jsrun to run jobs on the compute node (ie. Summit), the job will fail.

To reproduce

The following examples will fail:

@Project.operation
@flow.with_job
@flow.cmd
# ... pre and post conditions ...
def foo(job):
    return ('trap "some commands --args" EXIT')
@Project.operation
@flow.cmd
# ... pre and post conditions ...
def foo(job):
    return ('trap "cd {}; some commands --args" EXIT'.format(job.ws))

The following will run:

@Project.operation
@flow.cmd
# ... pre and post conditions ...
def gen_pqr(job):
    return ("cd {}; some commands --args".format(job.ws))

Error output

bash-4.2$ jsrun -n1 python flowprojects/project.py run -o gen_pqr
[h50n13:02512] PMIX ERROR: INVALID-NAMESPACE in file dstore_base.c at line 1739
Error (No such file or directory) executing process: trap
Using environment configuration: SummitEnvironment
ERROR: Encountered error during program execution: 'Command 'jsrun -n 1 -a 1 -c 1 -g 0  -d packed -b rs  trap "cd /path/to/job/ws/; some commands --args" EXIT' returned non-zero exit status 210.'

System configuration

Please complete the following information:

  • Operating System [e.g. macOS]: Red Hat Enterprise Linux (RHEL) version 7.6
  • Version of Python [e.g. 3.7]: 3.7.0
  • Version of signac [e.g. 1.0]: 1.6.0
  • Version of signac-flow: 0.12.0
@klywang klywang changed the title @flow.with_jobs does not work with @cmd in environments that use jsrun @with_jobs does not work with @cmd in environments that use jsrun Apr 12, 2021
@csadorf csadorf added bug Something isn't working cluster submission Enhancements to the submission process labels Apr 13, 2021
@klywang
Copy link
Contributor Author

klywang commented Apr 13, 2021

If I understand the issue correctly, addressing #73 should help with this bug.

We might want this:

return 'trap "cd $(pwd)" EXIT && cd {} && {}'.format(job.ws, func(job))

to be separate from what we submit. For example, rather than submitting jsrun -n 1 -a 1 -c 1 -g 0 -d packed -b rs trap "cd /path/to/job/ws/; some commands --args" EXIT', signac flow would submit trap "cd {job.ws}; jsrun -n 1 -a 1 -c 1 -g 0 -d packed -b rs some commands --args

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cluster submission Enhancements to the submission process
Projects
None yet
Development

No branches or pull requests

2 participants