You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cloned the repository FinEtoolsDDParallel.jl from GitHub.
I activated the environment and instantiated the packages. julia -e 'using Pkg; Pkg.activate("."); Pkg.instantiate()'
I ran julia interactively to install mpiexecjl. julia --project=.
I also set up the system binary: MPIPreferences.use_system_binary().
I ran srun to get an interactive session. srun -p short -n 10 --ntasks-per-node=1 --pty bash
I ran the example.
cd FinEtoolsDDParallel.jl/examples/ ~/a64fx/depot/bin/mpiexecjl -n 4 julia --project=. heat/Poisson2D_cg_mpi_driver.jl
After several minutes, the job was terminated. The error message is below.
Note well: On my laptop this example runs to completion in ~70 seconds.
julia: symbol lookup error: /lustre/home/pkrysl/a64fx/depot/artifacts/58dcf187642cdfbafb3581993ca3d8de565acc78/lib/openmpi/mca_pmix_pmix3x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
[1972752] signal (15): Terminated
in expression starting at /lustre/home/pkrysl/a64fx/FinEtoolsDDParallel.jl/examples/heat/Poisson2D_cg_mpi_driver.jl:160
_ZN12_GLOBAL__N_117InterleavedAccess13runOnFunctionERN4llvm8FunctionE at /lustre/software/julia/julia-1.10.3/lib/julia/libLLVM-15jl.so (unknown line)
_ZN12_GLOBAL__N_117InterleavedAccess13runOnFunctionERN4llvm8FunctionE at /lustre/software/julia/julia-1.10.3/lib/julia/libLLVM-15jl.so (unknown line)
unknown function (ip: (nil))
Allocations: 14621931 (Pool: 14605380; Big: 16551); GC: 19
julia: symbol lookup error: /lustre/home/pkrysl/a64fx/depot/artifacts/58dcf187642cdfbafb3581993ca3d8de565acc78/lib/openmpi/mca_pmix_pmix3x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads
julia: symbol lookup error: /lustre/home/pkrysl/a64fx/depot/artifacts/58dcf187642cdfbafb3581993ca3d8de565acc78/lib/openmpi/mca_pmix_pmix3x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[7833,1],1]
Exit code: 127
The text was updated successfully, but these errors were encountered:
PetrKryslUCSD
changed the title
opal_libevent2022_evthread_use_pthreads
Ookami: MPI error opal_libevent2022_evthread_use_pthreads
Jun 6, 2024
There is something wrong with the system mpi binary. Doing julia --project -e 'using MPIPreferences; MPIPreferences.use_system_binary()' leads to the above error; doing julia --project -e 'using MPIPreferences; MPIPreferences.use_jll_binary("OpenMPI_jll")' does not.
Seems something may go indeed wrong then when trying to hook into system libs. Maybe look into https://github.com/giordano/julia-on-ookami as IIRC @giordano did quite some extensive testing and use of that machine.
This is what I did on Ookami after I logged on. (Some more details on https://iacs-group.slack.com/archives/C016DRQ321M.)
My setup of the environment looks like this:
julia -e 'using Pkg; Pkg.activate("."); Pkg.instantiate()'
julia --project=.
I also set up the system binary: MPIPreferences.use_system_binary().
srun -p short -n 10 --ntasks-per-node=1 --pty bash
cd FinEtoolsDDParallel.jl/examples/
~/a64fx/depot/bin/mpiexecjl -n 4 julia --project=. heat/Poisson2D_cg_mpi_driver.jl
After several minutes, the job was terminated. The error message is below.
Note well: On my laptop this example runs to completion in ~70 seconds.
The text was updated successfully, but these errors were encountered: