crash in test_runtime_dm #609

MarcusGDaniels · 2023-09-06T16:09:04Z

Required prerequisites

Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

The unit test test_runtime_dm does this on Ubuntu 22.04

mdaniels@daniels:~/src/cuda-quantum/build/unittests$ ./test_runtime_dm
[==========] Running 67 tests from 21 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from dm_AdjointTester
[ RUN ] dm_AdjointTester.checkSimple
{ 0:1000 }
{ 00000:1000 }
{ 0:1000 }
{ 00:1000 }
{ 101:1000 }
[ OK ] dm_AdjointTester.checkSimple (71 ms)
[ RUN ] dm_AdjointTester.checkNestedAdjoint
{ 000:233 010:188 111:99 001:4 011:69 100:237 101:25 110:145 }
[ OK ] dm_AdjointTester.checkNestedAdjoint (6 ms)
[----------] 2 tests from dm_AdjointTester (78 ms total)

[----------] 17 tests from dm_BuilderTester
[ RUN ] dm_BuilderTester.checkSimple
realloc(): invalid pointer
Aborted (core dumped)

Steps to reproduce the bug

went into build directory and unittests and ran ./test_runtime_dm

Expected behavior

passing test

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression -- first time I have built cuda-quantum.

Environment

CUDA Quantum version: Today's git: 1537660
Python version: 3.10.12
C++ compiler: clang++ 16.0.0,
Operating system: Ubuntu 22.04

Suggestions

No response

The text was updated successfully, but these errors were encountered:

MarcusGDaniels · 2023-09-07T18:11:52Z

I built LLVM and cuda-quantum in Debug mode, and this is what I see:

   50  	}
   51  	
   52  	LLVM_ATTRIBUTE_RETURNS_NONNULL inline void *safe_realloc(void *Ptr, size_t Sz) {
-> 53  	  void *Result = std::realloc(Ptr, Sz);
   54  	  if (Result == nullptr) {
   55  	    // It is implementation-defined whether allocation occurs if the space
   56  	    // requested is zero (ISO/IEC 9899:2018 7.22.3). Retry, requesting
(lldb) 
frame #9: 0x00007fffeed7dd9e libcudaq-mlir-runtime.so`llvm::SmallVectorBase<unsigned int>::grow_pod(this=0x00007fffffff7840, FirstEl=0x00007fffffff7850, MinSize=4012645365, TSize=8) at SmallVector.cpp:151:33
   148 	    memcpy(NewElts, this->BeginX, size() * TSize);
   149 	  } else {
   150 	    // If this wasn't grown from the inline copy, grow the allocated space.
-> 151 	    NewElts = llvm::safe_realloc(this->BeginX, NewCapacity * TSize);
   152 	    if (NewElts == FirstEl)
   153 	      NewElts = replaceAllocation(NewElts, TSize, NewCapacity, size());
   154 	  }
(lldb) 
frame #10: 0x00007fffe7effa2b libcudaq-mlir-runtime.so`mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) + 91
libcudaq-mlir-runtime.so`llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>:
->  0x7fffe7effa2b <+91>:  movl   0x8(%r14), %eax
    0x7fffe7effa2f <+95>:  movq   (%r14), %rcx
    0x7fffe7effa32 <+98>:  movq   $0x0, (%rcx,%rax,8)
    0x7fffe7effa3a <+106>: movl   0x8(%r14), %eax
(lldb) 
frame #11: 0x00007fffeb662462 libcudaq-mlir-runtime.so`llvm::function_ref<mlir::ParseResult ()>::operator()(this=0x00007fffffff72a0) const at STLFunctionalExtras.h:68:12
   65  	        callable(reinterpret_cast<intptr_t>(&callable)) {}
   66  	
   67  	  Ret operator()(Params ...params) const {
-> 68  	    return callback(callable, std::forward<Params>(params)...);
   69  	  }
   70  	
   71  	  explicit operator bool() const { return callback; }
(lldb)

bmhowe23 · 2023-09-07T21:11:39Z

Hi Marcus - I was able to reproduce your failure. I think it has something to do with this being the first test running after the noise tests. For example, I can also reproduce the problem running just 2 tests, like this:

unittests/test_runtime_dm --gtest_filter=dm_NoiseTest.checkPhaseFlipType:dm_GetStateTester.checkSimple

I think we need to add cudaq::unset_noise(); either as cleanup after the noise tests or do that in the initialization of all tests.

For example, as a quick test, I added this patch, and the problem went away for me. Can you confirm that this fixes the problem in your environment, too? If so, we'll get a patch going soon.

--- a/unittests/integration/get_state_tester.cpp
+++ b/unittests/integration/get_state_tester.cpp
@@ -21,6 +21,8 @@ CUDAQ_TEST(GetStateTester, checkSimple) {
     cx(q, r);
   };
 
+  cudaq::unset_noise();
+
   auto state = cudaq::get_state(kernel);
   state.dump();
 #ifdef CUDAQ_BACKEND_DM

Also, we typically run our tests using ctest. For example, you can run the dm tests using this command: ctest -R "^dm_". Your way should also work, too, so I'm trying to say we shouldn't fix it ... I just wanted to give you other options in case you weren't aware of them.

MarcusGDaniels · 2023-09-07T21:33:02Z

I still get the realloc(): invalid pointer crash, and correctness of the unit tests is the same at 79%.

MarcusGDaniels · 2023-09-07T22:03:52Z

What version of LLVM is preferred? On Sep 7, 2023, at 2:11 PM, Ben Howe ***@***.***> wrote: Hi Marcus - I was able to reproduce your failure. I think it has something to do with this being the first test running after the noise tests. For example, I can also reproduce the problem running just 2 tests, like this: unittests/test_runtime_dm --gtest_filter=dm_NoiseTest.checkPhaseFlipType:dm_GetStateTester.checkSimple I think we need to add cudaq::unset_noise(); either as cleanup after the noise tests or do that in the initialization of all tests. For example, as a quick test, I added this patch, and the problem went away for me. Can you confirm that this fixes the problem in your environment, too? If so, we'll get a patch going soon.

…

--- a/unittests/integration/get_state_tester.cpp +++ b/unittests/integration/get_state_tester.cpp @@ -21,6 +21,8 @@ CUDAQ_TEST(GetStateTester, checkSimple) { cx(q, r); }; + cudaq::unset_noise(); + auto state = cudaq::get_state(kernel); state.dump(); #ifdef CUDAQ_BACKEND_DM Also, we typically run our tests using ctest. For example, you can run the dm tests using this command: ctest -R "^dm_". Your way should also work, too, so I'm trying to say we shouldn't fix it ... I just wanted to give you other options in case you weren't aware of them. — Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACVVP6FENWOPVEMIHPQSMLXZI2BNANCNFSM6AAAAAA4NSQE64>. You are receiving this because you authored the thread.Message ID: ***@***.***>

bmhowe23 · 2023-09-07T22:32:14Z

Our project is designed to be running with the same LLVM as is in the repo (tpls/llvm), which is currently this. It is not a named release version, but it's slightly before LLVM 16. Are you using a Docker image to test this?

MarcusGDaniels · 2023-09-07T22:34:26Z

Oh, I thought I could use a stock LLVM. All from-source. On Sep 7, 2023, at 3:32 PM, Ben Howe ***@***.***> wrote: Our project is designed to be running with the same LLVM as is in the repo (tpls/llvm), which is currently this<https://github.com/llvm/llvm-project/tree/c0b45fef155fbe3f17f9a6f99074682c69545488>. It is not a named release version, but it's slightly before LLVM 16. Are you using a Docker image to test this? — Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACVVP5AWY6CMRA32NELJGDXZJDPTANCNFSM6AAAAAA4NSQE64>. You are receiving this because you authored the thread.Message ID: ***@***.***>

bmhowe23 · 2023-09-07T22:37:51Z

Perhaps @schweitzpgi can weigh in to confirm or deny, but I thought it had to be aligned with the LLVM submodule in our repo.

In any case, it's very strange that we both saw similar errors in similar spots, but the fix for my environment didn't change anything for yours. You probably did this, but just to confirm .... did you go into the build directory and run ninja again after updating the unittests/integration/get_state_tester.cpp file?

schweitzpgi · 2023-09-09T00:03:48Z

Perhaps @schweitzpgi can weigh in to confirm or deny, but I thought it had to be aligned with the LLVM submodule in our repo.

The project is stuck in a holding pattern at LLVM 16.x still. I believe some people have used downloads of prebuilt. official LLVM 16 with some success. The container image is built against the pre-16 LLVM that can be found in tpls/llvm.

I will open an issue (if there isn't one) to at least bump the container/CI environment up to the officially released LLVM 16.

MarcusGDaniels · 2023-09-09T04:59:25Z

Switching to the tpls/llvm did prevent the unittest errors. Thanks, Marcus From: Eric Schweitz ***@***.***> Date: Friday, September 8, 2023 at 5:04 PM To: NVIDIA/cuda-quantum ***@***.***> Cc: Marcus Daniels ***@***.***>, Author ***@***.***> Subject: Re: [NVIDIA/cuda-quantum] crash in test_runtime_dm (Issue #609) Perhaps @schweitzpgi<https://github.com/schweitzpgi> can weigh in to confirm or deny, but I thought it had to be aligned with the LLVM submodule in our repo. The project is stuck in a holding pattern at LLVM 16.x still. I believe some people have used downloads of prebuilt. official LLVM 16 with some success. The container image is built against the pre-16 LLVM that can be found in tpls/llvm. I will open an issue (if there isn't one) to at least bump the container/CI environment up to the officially released LLVM 16. — Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACVVP7SNZBS6A7GOKQFHO3XZOW67ANCNFSM6AAAAAA4NSQE64>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Issue discovered while investigating Issue NVIDIA#609

This helps fix issue #609 by unsetting noise models at the end of noise tests. Without this change, the prior test's noise model could be used in subsequent tests, which is undesirable.

bmhowe23 · 2023-09-12T17:31:06Z

Closing this because I believe it is all covered by PR #635 and the new issue #633 now.

This helps fix issue NVIDIA#609 by unsetting noise models at the end of noise tests. Without this change, the prior test's noise model could be used in subsequent tests, which is undesirable.

schweitzpgi mentioned this issue Sep 9, 2023

Bump the LLVM version to the latest official LLVM 16 release #633

Closed

bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this issue Sep 9, 2023

Unset noise after noise tests

263686c

Issue discovered while investigating Issue NVIDIA#609

bmhowe23 mentioned this issue Sep 9, 2023

Unset noise after noise tests #635

Merged

bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this issue Sep 12, 2023

Unset noise after noise tests

dae7cd6

Issue discovered while investigating Issue NVIDIA#609

bmhowe23 closed this as completed Sep 12, 2023

bettinaheim added ecosystem bug Something isn't working labels Oct 26, 2023

bettinaheim added this to the release 0.5.0 milestone Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crash in test_runtime_dm #609

crash in test_runtime_dm #609

MarcusGDaniels commented Sep 6, 2023

MarcusGDaniels commented Sep 7, 2023 •

edited

Loading

bmhowe23 commented Sep 7, 2023

MarcusGDaniels commented Sep 7, 2023

MarcusGDaniels commented Sep 7, 2023 via email

bmhowe23 commented Sep 7, 2023

MarcusGDaniels commented Sep 7, 2023 via email

bmhowe23 commented Sep 7, 2023

schweitzpgi commented Sep 9, 2023

MarcusGDaniels commented Sep 9, 2023 via email

bmhowe23 commented Sep 12, 2023

crash in test_runtime_dm #609

crash in test_runtime_dm #609

Comments

MarcusGDaniels commented Sep 6, 2023

Required prerequisites

Describe the bug

Steps to reproduce the bug

Expected behavior

Is this a regression? If it is, put the last known working version (or commit) here.

Environment

Suggestions

MarcusGDaniels commented Sep 7, 2023 • edited Loading

bmhowe23 commented Sep 7, 2023

MarcusGDaniels commented Sep 7, 2023

MarcusGDaniels commented Sep 7, 2023 via email

bmhowe23 commented Sep 7, 2023

MarcusGDaniels commented Sep 7, 2023 via email

bmhowe23 commented Sep 7, 2023

schweitzpgi commented Sep 9, 2023

MarcusGDaniels commented Sep 9, 2023 via email

bmhowe23 commented Sep 12, 2023

MarcusGDaniels commented Sep 7, 2023 •

edited

Loading