You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have checked that there is not already an existing issues for what you are reporting.
Expected behavior and actual behavior
I'm trying to run the parallel command on two nodes of a HPC cluster using the hostnames option in parallel initialize. When I specify the hostnames, I obtained the error "child process 0002 Exited with error -700- while running the command/dofile (view log)...". The logfile __pll[pll_id]_do0002.log is empty.
The command works fine without the hostnames option (working only on one node).
Steps to reproduce the problem
The following code is saved in the file test_parallel.do:
parallel initialize 2, f h("localhost cn07")
sysuse auto
parallel, by(foreign) : egen maxp = max(price)
The code is launched with the command stata test_parallel.do inside a SLURM batch file (which request the node cn07").
System information
Stata version and flavor (e.g. v14 MP): Stata16-MP
OS type and version (e.g. Windows 10): CentOS Linux release 7.5.1804
Parallel version: 1.20.0 19mar2019
Output from creturn list:
The text was updated successfully, but these errors were encountered:
Working with Slurm can be tricky sometimes. One key issue I've seen in the past is nodes' to filesystems. For parallel to work, all nodes need to have I/O access to the data and tempfiles. This issue seems to be a bug. Thanks for reporting.
Normally, the nodes have IO access to the data and tempfile : data are on a file system shared among the nodes and I set the TMPDIR variable to a folder on this shared file system (originally to not saturate the disk space of node)
The command tempfile junk; display "`junk'" prints a tempfile which is in the shared folder that I specified in the TMPDIR variable. So it seems Stata recognizes the shared path. Besides, the logfile __pllul97ezlin1__do0001.log and __pllul97ezlin1__do0002.log are in this folder.
Preliminaries
Before submitting an issue, please check (with
x
in brackets) that you:Expected behavior and actual behavior
I'm trying to run the parallel command on two nodes of a HPC cluster using the hostnames option in parallel initialize. When I specify the hostnames, I obtained the error "child process 0002 Exited with error -700- while running the command/dofile (view log)...". The logfile __pll[pll_id]_do0002.log is empty.
The command works fine without the hostnames option (working only on one node).
Steps to reproduce the problem
The following code is saved in the file test_parallel.do:
The code is launched with the command
stata test_parallel.do
inside a SLURM batch file (which request the node cn07").System information
Output from
creturn list
:The text was updated successfully, but these errors were encountered: