You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python : 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
Platform : Linux-4.18.0-372.26.1.el8_6.x86_64-x86_64-with-glibc2.28
Legion : v24.01.00.dev-38-g90944d7
Legate : 24.01.00.dev+38.g90944d7
WARNING: Disabling control replication for interactive run
Disable Control Replication
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: c315-012
Local device: mlx5_0
--------------------------------------------------------------------------
Cunumeric : 24.01.00.dev+29.g503affb8
Numpy : 2.0.0
Scipy : 1.14.0
Numba : 0.60.0
/work/08435/srvenkat/ls6/miniconda3/lib/python3.12/site-packages/conda_package_streaming/package_streaming.py:25: UserWarning: zstandard could not be imported. Running without .conda support.
warnings.warn("zstandard could not be imported. Running without .conda support.")
/work/08435/srvenkat/ls6/miniconda3/lib/python3.12/site-packages/conda_package_handling/api.py:29: UserWarning: Install zstandard Python bindings for .conda support
_warnings.warn("Install zstandard Python bindings for .conda support")
CTK package : cuda-version-12.4-hbda6634_3 (pkgs/main)
GPU driver : 535.104.12
GPU devices :
GPU 0: NVIDIA A100-PCIE-40GB
GPU 1: NVIDIA A100-PCIE-40GB
GPU 2: NVIDIA A100-PCIE-40GB
Jupyter notebook / Jupyter Lab version
No response
Expected behavior
I ran the cholesky.py example with -n 257 and expected to see the timing/flops output.
Observed behavior
I got an error saying the matrix is not positive definite. This was strange since I believe the example uses an identity matrix. I do not get the error for -n 256 or less.
Example code or instructions
legate--gpus1 ./cholesky.py-n257
Stack traceback or browser console output
(legate-ucx) c315-012.ls6(1033)$ legate --gpus 1 ./cholesky.py -n 257
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: c315-012
Local device: mlx5_0
--------------------------------------------------------------------------
Elapsed Time: 52.263 ms
108267.2062453361 GOP/s
[0 - 14f511066000] 1.320818 {6}{python}: python exception occurred within task:
numpy.linalg.LinAlgError: Matrix is not positive definite
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/work/08435/srvenkat/ls6/miniconda3/envs/legate-ucx/lib/python3.1/site-packages/legion_top.py", line 481, in legion_python_main
cleanup()
File "/work/08435/srvenkat/ls6/miniconda3/envs/legate-ucx/lib/python3.1/site-packages/legate/core/runtime.py", line 2164, in _cleanup_legate_runtime
runtime.destroy()
File "/work/08435/srvenkat/ls6/miniconda3/envs/legate-ucx/lib/python3.1/site-packages/legate/core/runtime.py", line 1322, in destroy
self.raise_exceptions()
File "/work/08435/srvenkat/ls6/miniconda3/envs/legate-ucx/lib/python3.1/site-packages/legate/core/runtime.py", line 2075, in raise_exceptions
pending.raise_exception()
File "/work/08435/srvenkat/ls6/miniconda3/envs/legate-ucx/lib/python3.1/site-packages/legate/core/exception.py", line 50, in raise_exception
raise exn_reraised from exn_original
numpy.linalg.LinAlgError: Matrix is not positive definite
legion_python: /work/08435/srvenkat/ls6/legate.core/_skbuild/linux-x86_64-3.11/cmake-build/_deps/legion-src/runtime/realm/python/python_module.cc:1054: virtual void Realm::LocalPythonProcessor::execute_task(Realm::Processor::TaskFuncID, const Realm::ByteArrayRef&): Assertion `0' failed.
Signal 6 received by node 0, process 2983422 (thread 14f511066000) - obtaining backtrace
Signal 6 received by process 2983422 (thread 14f511066000) at: stack trace: 14 frames
[0] = raise at unknown file:0 [000014f78aaeba9f]
[1] = abort at unknown file:0 [000014f78aabee04]
[2] = __assert_fail_base.cold.0 at unknown file:0 [000014f78aabecd8]
[3] = __assert_fail at unknown file:0 [000014f78aae43f5]
[4] = Realm::LocalPythonProcessor::execute_task(unsigned int, Realm::ByteArrayRef const&) at unknown file:0 [000014f78b41463a]
[5] = Realm::Task::execute_on_processor(Realm::Processor) at unknown file:0 [000014f78b3aaf41]
[6] = Realm::KernelThreadTaskScheduler::execute_task(Realm::Task*) at unknown file:0 [000014f78b3aafd5]
[7] = Realm::PythonThreadTaskScheduler::execute_task(Realm::Task*) at unknown file:0 [000014f78b41740c]
[8] = Realm::ThreadedTaskScheduler::scheduler_loop() at unknown file:0 [000014f78b3a9325]
[9] = Realm::PythonThreadTaskScheduler::python_scheduler_loop() at unknown file:0 [000014f78b415f1e]
[10] = Realm::KernelThread::pthread_entry(void*) at unknown file:0 [000014f78b3aed73]
[11] = start_thread at unknown file:0 [000014f7889581ce]
[12] = __clone at unknown file:0 [000014f78aad6dd2]
[13] = unknown symbol at unknown file:0 [ffffffffffffffff]
The text was updated successfully, but these errors were encountered:
Software versions
Jupyter notebook / Jupyter Lab version
No response
Expected behavior
I ran the
cholesky.py
example with-n 257
and expected to see the timing/flops output.Observed behavior
I got an error saying the matrix is not positive definite. This was strange since I believe the example uses an identity matrix. I do not get the error for
-n 256
or less.Example code or instructions
Stack traceback or browser console output
The text was updated successfully, but these errors were encountered: