Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash in test_runtime_dm #609

Closed
3 tasks done
MarcusGDaniels opened this issue Sep 6, 2023 · 10 comments
Closed
3 tasks done

crash in test_runtime_dm #609

MarcusGDaniels opened this issue Sep 6, 2023 · 10 comments
Labels
bug Something isn't working ecosystem
Milestone

Comments

@MarcusGDaniels
Copy link

Required prerequisites

  • Make sure you've read the documentation. Your issue may be addressed there.
  • Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
  • If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

The unit test test_runtime_dm does this on Ubuntu 22.04

mdaniels@daniels:~/src/cuda-quantum/build/unittests$ ./test_runtime_dm
[==========] Running 67 tests from 21 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from dm_AdjointTester
[ RUN ] dm_AdjointTester.checkSimple
{ 0:1000 }
{ 00000:1000 }
{ 0:1000 }
{ 00:1000 }
{ 101:1000 }
[ OK ] dm_AdjointTester.checkSimple (71 ms)
[ RUN ] dm_AdjointTester.checkNestedAdjoint
{ 000:233 010:188 111:99 001:4 011:69 100:237 101:25 110:145 }
[ OK ] dm_AdjointTester.checkNestedAdjoint (6 ms)
[----------] 2 tests from dm_AdjointTester (78 ms total)

[----------] 17 tests from dm_BuilderTester
[ RUN ] dm_BuilderTester.checkSimple
realloc(): invalid pointer
Aborted (core dumped)

Steps to reproduce the bug

went into build directory and unittests and ran ./test_runtime_dm

Expected behavior

passing test

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression -- first time I have built cuda-quantum.

Environment

  • CUDA Quantum version: Today's git: 1537660
  • Python version: 3.10.12
  • C++ compiler: clang++ 16.0.0,
  • Operating system: Ubuntu 22.04

Suggestions

No response

@MarcusGDaniels
Copy link
Author

MarcusGDaniels commented Sep 7, 2023

I built LLVM and cuda-quantum in Debug mode, and this is what I see:

   50  	}
   51  	
   52  	LLVM_ATTRIBUTE_RETURNS_NONNULL inline void *safe_realloc(void *Ptr, size_t Sz) {
-> 53  	  void *Result = std::realloc(Ptr, Sz);
   54  	  if (Result == nullptr) {
   55  	    // It is implementation-defined whether allocation occurs if the space
   56  	    // requested is zero (ISO/IEC 9899:2018 7.22.3). Retry, requesting
(lldb) 
frame #9: 0x00007fffeed7dd9e libcudaq-mlir-runtime.so`llvm::SmallVectorBase<unsigned int>::grow_pod(this=0x00007fffffff7840, FirstEl=0x00007fffffff7850, MinSize=4012645365, TSize=8) at SmallVector.cpp:151:33
   148 	    memcpy(NewElts, this->BeginX, size() * TSize);
   149 	  } else {
   150 	    // If this wasn't grown from the inline copy, grow the allocated space.
-> 151 	    NewElts = llvm::safe_realloc(this->BeginX, NewCapacity * TSize);
   152 	    if (NewElts == FirstEl)
   153 	      NewElts = replaceAllocation(NewElts, TSize, NewCapacity, size());
   154 	  }
(lldb) 
frame #10: 0x00007fffe7effa2b libcudaq-mlir-runtime.so`mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) + 91
libcudaq-mlir-runtime.so`llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>:
->  0x7fffe7effa2b <+91>:  movl   0x8(%r14), %eax
    0x7fffe7effa2f <+95>:  movq   (%r14), %rcx
    0x7fffe7effa32 <+98>:  movq   $0x0, (%rcx,%rax,8)
    0x7fffe7effa3a <+106>: movl   0x8(%r14), %eax
(lldb) 
frame #11: 0x00007fffeb662462 libcudaq-mlir-runtime.so`llvm::function_ref<mlir::ParseResult ()>::operator()(this=0x00007fffffff72a0) const at STLFunctionalExtras.h:68:12
   65  	        callable(reinterpret_cast<intptr_t>(&callable)) {}
   66  	
   67  	  Ret operator()(Params ...params) const {
-> 68  	    return callback(callable, std::forward<Params>(params)...);
   69  	  }
   70  	
   71  	  explicit operator bool() const { return callback; }
(lldb) 

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Sep 7, 2023

Hi Marcus - I was able to reproduce your failure. I think it has something to do with this being the first test running after the noise tests. For example, I can also reproduce the problem running just 2 tests, like this:

unittests/test_runtime_dm --gtest_filter=dm_NoiseTest.checkPhaseFlipType:dm_GetStateTester.checkSimple

I think we need to add cudaq::unset_noise(); either as cleanup after the noise tests or do that in the initialization of all tests.

For example, as a quick test, I added this patch, and the problem went away for me. Can you confirm that this fixes the problem in your environment, too? If so, we'll get a patch going soon.

--- a/unittests/integration/get_state_tester.cpp
+++ b/unittests/integration/get_state_tester.cpp
@@ -21,6 +21,8 @@ CUDAQ_TEST(GetStateTester, checkSimple) {
     cx(q, r);
   };
 
+  cudaq::unset_noise();
+
   auto state = cudaq::get_state(kernel);
   state.dump();
 #ifdef CUDAQ_BACKEND_DM

Also, we typically run our tests using ctest. For example, you can run the dm tests using this command: ctest -R "^dm_". Your way should also work, too, so I'm trying to say we shouldn't fix it ... I just wanted to give you other options in case you weren't aware of them.

@MarcusGDaniels
Copy link
Author

I still get the realloc(): invalid pointer crash, and correctness of the unit tests is the same at 79%.

@MarcusGDaniels
Copy link
Author

MarcusGDaniels commented Sep 7, 2023 via email

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Sep 7, 2023

Our project is designed to be running with the same LLVM as is in the repo (tpls/llvm), which is currently this. It is not a named release version, but it's slightly before LLVM 16. Are you using a Docker image to test this?

@MarcusGDaniels
Copy link
Author

MarcusGDaniels commented Sep 7, 2023 via email

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Sep 7, 2023

Perhaps @schweitzpgi can weigh in to confirm or deny, but I thought it had to be aligned with the LLVM submodule in our repo.

In any case, it's very strange that we both saw similar errors in similar spots, but the fix for my environment didn't change anything for yours. You probably did this, but just to confirm .... did you go into the build directory and run ninja again after updating the unittests/integration/get_state_tester.cpp file?

@schweitzpgi
Copy link
Collaborator

Perhaps @schweitzpgi can weigh in to confirm or deny, but I thought it had to be aligned with the LLVM submodule in our repo.

The project is stuck in a holding pattern at LLVM 16.x still. I believe some people have used downloads of prebuilt. official LLVM 16 with some success. The container image is built against the pre-16 LLVM that can be found in tpls/llvm.

I will open an issue (if there isn't one) to at least bump the container/CI environment up to the officially released LLVM 16.

@MarcusGDaniels
Copy link
Author

MarcusGDaniels commented Sep 9, 2023 via email

bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this issue Sep 9, 2023
Issue discovered while investigating Issue NVIDIA#609
bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this issue Sep 12, 2023
Issue discovered while investigating Issue NVIDIA#609
bmhowe23 added a commit that referenced this issue Sep 12, 2023
This helps fix issue #609 by unsetting noise models at the end of noise tests. Without this change, the prior test's noise model could be used in subsequent tests, which is undesirable.
@bmhowe23
Copy link
Collaborator

Closing this because I believe it is all covered by PR #635 and the new issue #633 now.

@bettinaheim bettinaheim added ecosystem bug Something isn't working labels Oct 26, 2023
@bettinaheim bettinaheim added this to the release 0.5.0 milestone Oct 26, 2023
MarkusPfundstein pushed a commit to fermioniq/cuda-quantum that referenced this issue Sep 23, 2024
This helps fix issue NVIDIA#609 by unsetting noise models at the end of noise tests. Without this change, the prior test's noise model could be used in subsequent tests, which is undesirable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ecosystem
Projects
None yet
Development

No branches or pull requests

4 participants