-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault on large linear systems #218
Comments
@amontoison I have no idea what that function even does 😢 Can you run it through the address sanitizer? |
@jfowkes I compiled SPRAL with
[2813840] signal (11.1): Erreur de segmentation
in expression starting at /home/montalex/JuMP-dev/jump.jl:24
dgemm_beta_ZEN at /home/montalex/.julia/artifacts/93ddb84060b49f38ec59d4b04a3109fedc4577d2/lib/libopenblas.so (unknown line)
Allocations: 167241614 (Pool: 167213196; Big: 28418); GC: 71
[2814581] signal (11.1): Erreur de segmentation
in expression starting at /home/montalex/JuMP-dev/jump.jl:24
mkl_blas_def_dgemm_mscale at /home/montalex/.julia/artifacts/347e4bf25d69805922225ce6bf819ef0b8715426/lib/libmkl_def.so.2 (unknown line)
mkl_blas_def_xdgemm_bdz at /home/montalex/.julia/artifacts/347e4bf25d69805922225ce6bf819ef0b8715426/lib/libmkl_def.so.2 (unknown line)
mkl_blas_def_xdgemm at /home/montalex/.julia/artifacts/347e4bf25d69805922225ce6bf819ef0b8715426/lib/libmkl_def.so.2 (unknown line)
mkl_blas_dgemm_omp_driver_v1 at /home/montalex/.julia/artifacts/347e4bf25d69805922225ce6bf819ef0b8715426/lib/libmkl_intel_thread.so.2 (unknown line)
mkl_blas_dgemm at /home/montalex/.julia/artifacts/347e4bf25d69805922225ce6bf819ef0b8715426/lib/libmkl_intel_thread.so.2 (unknown line)
dgemm_ at /home/montalex/.julia/artifacts/347e4bf25d69805922225ce6bf819ef0b8715426/lib/libmkl_intel_lp64.so.2 (unknown line)
mkl_blas__dgemm at /home/montalex/.julia/artifacts/347e4bf25d69805922225ce6bf819ef0b8715426/lib/libmkl_rt.so (unknown line)
spral_c_dgemm at /workspace/srcdir/spral/builddir/../src/ssids/cpu/cpu_iface.f90:110
host_gemm<double> at /workspace/srcdir/spral/builddir/../src/ssids/cpu/kernels/wrappers.cxx:27
form_contrib at /workspace/srcdir/spral/builddir/../src/ssids/cpu/kernels/ldlt_app.cxx:1181
_ZN5spral5ssids3cpu17ldlt_app_internal4LDLTIdLi32ENS2_10CopyBackupIdNS1_14BuddyAllocatorIdSaIdEEEEELb1ELb0ES7_E18run_elim_unpivotedEiiPiPdiSB_RNS2_10ColumnDataIdNS5_IiS6_EEEERS8_SA_RKNS1_18cpu_factor_optionsEidSB_iRSt6vectorINS1_9WorkspaceESaISL_EERKS7_._omp_fn.4 at /workspace/srcdir/spral/builddir/../src/ssids/cpu/kernels/ldlt_app.cxx:1966
GOMP_task at /workspace/srcdir/gcc-13.2.0/libgomp/task.c:584
run_elim_unpivoted at /workspace/srcdir/spral/builddir/../src/ssids/cpu/kernels/ldlt_app.cxx:1943
factor at /workspace/srcdir/spral/builddir/../src/ssids/cpu/kernels/ldlt_app.cxx:2318
ldlt_app_factor<double, spral::ssids::cpu::BuddyAllocator<double, std::allocator<double> > > at /workspace/srcdir/spral/builddir/../src/ssids/cpu/kernels/ldlt_app.cxx:2530
factor_node_indef<double, spral::ssids::cpu::BuddyAllocator<double, std::allocator<double> > > at /workspace/srcdir/spral/builddir/../src/ssids/cpu/factor.hxx:60
factor_node<false, double, spral::ssids::cpu::BuddyAllocator<double, std::allocator<double> > > at /workspace/srcdir/spral/builddir/../src/ssids/cpu/factor.hxx:184
_ZN5spral5ssids3cpu14NumericSubtreeILb0EdLm8388608ENS1_11AppendAllocIdEEEC2ERKNS1_15SymbolicSubtreeEPKdSA_PPvRKNS1_18cpu_factor_optionsERNS1_11ThreadStatsE._omp_fn.1 at /workspace/srcdir/spral/builddir/../src/ssids/cpu/NumericSubtree.hxx:193
GOMP_task at /workspace/srcdir/gcc-13.2.0/libgomp/task.c:584
NumericSubtree at /workspace/srcdir/spral/builddir/../src/ssids/cpu/NumericSubtree.hxx:162
spral_ssids_cpu_create_num_subtree_dbl at /workspace/srcdir/spral/builddir/../src/ssids/cpu/NumericSubtree.cxx:52
factor at /workspace/srcdir/spral/builddir/../src/ssids/cpu/subtree.f90:271
__spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.2 at /workspace/srcdir/spral/builddir/../src/ssids/fkeep.F90:150
GOMP_taskgroup_end at /workspace/srcdir/gcc-13.2.0/libgomp/task.c:2330
__spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.1 at /workspace/srcdir/spral/builddir/../src/ssids/fkeep.F90:143
GOMP_parallel at /workspace/srcdir/gcc-13.2.0/libgomp/parallel.c:178
__spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.0 at /workspace/srcdir/spral/builddir/../src/ssids/fkeep.F90:134
GOMP_parallel at /workspace/srcdir/gcc-13.2.0/libgomp/parallel.c:178
inner_factor_cpu at /workspace/srcdir/spral/builddir/../src/ssids/fkeep.F90:132
ssids_factor_ptr64_double at /workspace/srcdir/spral/builddir/../src/ssids/ssids.f90:1049
ssids_factor_ptr32_double at /workspace/srcdir/spral/builddir/../src/ssids/ssids.f90:760
spral_ssids_factor_ptr32 at /workspace/srcdir/spral/builddir/../interfaces/C/ssids.f90:574
_ZN5Ipopt20SpralSolverInterface10MultiSolveEbPKiS2_iPdbi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16TSymLinearSolver10MultiSolveERKNS_9SymMatrixERSt6vectorINS_8SmartPtrIKNS_6VectorEEESaIS8_EERS4_INS5_IS6_EESaISC_EEbi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt18StdAugSystemSolver10MultiSolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRSt6vectorINS_8SmartPtrIS5_EESaISC_EESF_SF_SF_RSA_INSB_IS4_EESaISG_EESJ_SJ_SJ_bi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt15AugSystemSolver5SolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRS5_SA_SA_SA_RS4_SB_SB_SB_bi at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt22LeastSquareMultipliers20CalculateMultipliersERNS_6VectorES2_ at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt25DefaultIterateInitializer18least_square_multsERKNS_10JournalistERNS_8IpoptNLPERNS_9IpoptDataERNS_25IpoptCalculatedQuantitiesERKNS_8SmartPtrINS_22EqMultiplierCalculatorEEEd at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt25DefaultIterateInitializer18SetInitialIteratesEv at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt14IpoptAlgorithm18InitializeIteratesEv at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt14IpoptAlgorithm8OptimizeEb at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication13call_optimizeEv at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEERNS1_INS_16AlgorithmBuilderEEE at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEE at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
_ZN5Ipopt16IpoptApplication12OptimizeTNLPERKNS_8SmartPtrINS_4TNLPEEE at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
IpoptSolve at /home/montalex/.julia/artifacts/fae193a058a11ca67f4e685cbe866727bc5b4c00/lib/libipopt.so (unknown line)
IpoptSolve at /home/montalex/.julia/packages/Ipopt/bqp63/src/C_wrapper.jl:442
#solve!#7 at /home/montalex/.julia/packages/NLPModelsIpopt/0YgvC/src/NLPModelsIpopt.jl:240 |
@mjacobse do you have any idea what’s going on here or how we could best debug this? |
Does it also segfault when running SSIDS serially? I.e. build without OpenMP or set A way to reproduce this would be helpful, perhaps the offending matrix can be exported as .rb file? |
For JuMP-dev 2024, I wanted to give the elapsed time to solve a very large optimization problem with Ipopt and different linear solvers (MUMPS, SPRAL, MA27, MA57) but I got a segmentation fault for SPRAL:
The culprit seems to be this function.
The text was updated successfully, but these errors were encountered: