Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Array hierarchy and allocate nD arrays in a contiguous block by default #1236

Merged
merged 25 commits into from
May 23, 2024

Conversation

KrisThielemans
Copy link
Collaborator

@KrisThielemans KrisThielemans commented Aug 24, 2023

addresses the following

  • Array and VectorWithOffset did not have move constructors, so we were using the default generated ones which might not be optimal
  • some copy constructors were the default generates ones, but they now needed to be added explicitly
  • some assignment operators were the default generates ones, which could imply unnecessary reallocation (in 1D)
  • nD Arrays used many different separately allocated 1D Arrays. This had the consequence that memory is not contiguous (which prevents some optimisations) as well as many calls to new[] and delete[], which turns out to be slow. The default is not to allocate one single block, and let the nD Array point into that. This happens transparent to the rest of the code. CAVEAT: it does mean that growing an nD array in the "first" dimensions could now be less efficient (as it will reallocate everything).

Currently ctest is happy on my VM and test_Array and test_VectorWithOffset work fine via valgrind (Ubuntu, gcc8). We'll see what GHA and AppVeyor say...

@markus-jehl this should be fine for you to test now.

Things still to do:

  • check if growing an nD contiguous array deallocates original block if it can
  • adapt other code to take advantage of this, e.g. reading/writing nD Arrays could now read/write in 1 chunk, which would be faster, e.g. read_data, convert_array
  • make nD Array assignment such that it doesn't create a copy first if reallocation can be avoided.
  • check if we need move constructors/assignments in derived classes such as Sinogram etc, or are the default ones ok. (operator= might need implemented to benefit from previous bullet)
  • check if it actually works for everything and improves practical run-times

@markus-jehl
Copy link
Contributor

Thanks! I will have a look

@markus-jehl
Copy link
Contributor

Does building SWIG work for you? I get some errors, e.g.:

In file included from /workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir-build/src/swig/CMakeFiles/_stir.dir/stirPYTHON_wrap.cxx:3964:
In file included from /workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/ProjDataInfoBlocksOnCylindricalNoArcCorr.h:27:
In file included from /workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/ProjDataInfoGenericNoArcCorr.h:27:
In file included from /workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/GeometryBlocksOnCylindrical.h:31:
In file included from /workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/Array.h:34:
In file included from /workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/NumericVectorWithOffset.h:166:
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/NumericVectorWithOffset.inl:49:5: error: no matching constructor for initialization of 'VectorWithOffset<stir::Array<1, float>>'
  : base_type(min_index, max_index, data_ptr)
    ^         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir-build/src/swig/CMakeFiles/_stir.dir/stirPYTHON_wrap.cxx:51302:85: note: in instantiation of member function 'stir::NumericVectorWithOffset<stir::Array<1, float>, float>::NumericVectorWithOffset' requested here
      result = (stir::NumericVectorWithOffset< stir::Array< 1,float >,float > *)new stir::NumericVectorWithOffset< stir::Array< 1,float >,float >(arg1,arg2,arg3);
                                                                                    ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:159:5: note: candidate constructor not viable: no known conversion from 'float *const' to 'stir::Array<1, float> *const' for 3rd argument
    VectorWithOffset(const int min_index, const int max_index, 
    ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:140:5: note: candidate constructor not viable: no known conversion from 'const int' to 'stir::Array<1, float> *const' for 2nd argument
    VectorWithOffset(const int hsz, 
    ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:136:10: note: candidate constructor not viable: requires 2 arguments, but 3 were provided
  inline VectorWithOffset(const int min_index, const int max_index);
         ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:146:5: note: candidate constructor not viable: requires 2 arguments, but 3 were provided
    VectorWithOffset(const int hsz, 
    ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:153:5: note: candidate constructor not viable: requires 4 arguments, but 3 were provided
    VectorWithOffset(const int min_index, const int max_index, 
    ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:133:19: note: candidate constructor not viable: requires single argument 'hsz', but 3 arguments were provided
  inline explicit VectorWithOffset(const int hsz);
                  ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:165:10: note: candidate constructor not viable: requires single argument 'il', but 3 arguments were provided
  inline VectorWithOffset(const VectorWithOffset &il) ;
         ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:187:3: note: candidate constructor not viable: requires single argument 'other', but 3 arguments were provided
  VectorWithOffset(VectorWithOffset&& other) noexcept;
  ^
/workspace/python-reconstruction-pipeline/libs/build/stir/Stir-prefix/src/Stir/src/include/stir/VectorWithOffset.h:130:10: note: candidate constructor not viable: requires 0 arguments, but 3 were provided
  inline VectorWithOffset();
         ^

@KrisThielemans
Copy link
Collaborator Author

forgot to push...

@markus-jehl
Copy link
Contributor

Perfect, thanks! It builds now, but importing stir in python leads to an error still:
/workspace/python-reconstruction-pipeline/libs/install/stir/python/_stir.so: undefined symbol: _ZN4stir5ArrayILi1EfEC1ERKNS_10IndexRangeILi1EEEPf

I'll go through the code changes in patience to understand all the things that have changed.

@KrisThielemans
Copy link
Collaborator Author

ok. I didn't try that yet. This appears to be stir::Array<1, float>::Array(stir::IndexRange<1> const&, float*). This isn't implemented yet, but also not used. I'm trying to comment it out and see what happens. Ultimately, the init function should be protected, and exposed via relevant constructors.

It does point to the fact that we're SWIGging too much. (anything with elemT* makes no sense in Python). We could either %ignore them, but might be easiest to put them between #ifdef SWIG in the .h/.inl file.

@KrisThielemans
Copy link
Collaborator Author

I'll rebase this on current master as well. I'm assuming that's ok. Should have done that before creating the PR. sorry

@KrisThielemans
Copy link
Collaborator Author

done. this imports now.

@KrisThielemans
Copy link
Collaborator Author

KrisThielemans commented Aug 24, 2023

ah well, committed a temp stir.i file. Restored now (again with force push)

@markus-jehl
Copy link
Contributor

Ran the latest version this morning through the python reconstruction steps and everything worked fine! Will later also test it in C++, as well as with sanitisers

@KrisThielemans
Copy link
Collaborator Author

that's great. but is it any faster? (or does it use more memory?)

@markus-jehl
Copy link
Contributor

Not noticeably in the setup I used, but I want to test this with C++ where everything is a bit more controllable.

@KrisThielemans
Copy link
Collaborator Author

Oops, I accidentally pushed a merge with #1237. That wasn't my intention. Depending on how I feel, I might still revert that...

@markus-jehl
Copy link
Contributor

Tested it now in C++ and both speed and memory consumption look identical.

@KrisThielemans
Copy link
Collaborator Author

ah well. maybe if we exploit it a bit more. But profiling often throws "conventional wisdom" in the bin...

@KrisThielemans
Copy link
Collaborator Author

Tested it now in C++ and both speed and memory consumption look identical.

the current test_Array contains some (de)allocation timings for a 4D array of size 20x100x400x600. On my desktop I get

creation of non-contiguous 4D Array 654.412ms
deletion 7.956 ms
contiguous array creation (total) 224.647ms
deletion 7.317 ms

That might be non-negligible for GPU applications, but obviously it is essentially 20 3D images, so in practice it might just not matter.

I guess we should add some timings on doing copies etc.

@KrisThielemans
Copy link
Collaborator Author

First job failure due to #1378. gcc12-cuda0 job failure: there seems to be a time-out or something in recon_test_pack/simulate_data.sh. I cannot reproduce this.

@KrisThielemans KrisThielemans force-pushed the Array branch 2 times, most recently from b3ad8c4 to 7cfdadf Compare February 13, 2024 22:40
@KrisThielemans
Copy link
Collaborator Author

I cannot figure out what is wrong with the gcc12-cuda0 (C++-20) job.

  • I've used tmate to ssh in. I can then run the test without problems.
  • 018196f added let the script output to stdout as opposed to a log file (and disabled the ctest). This goes a few more tests further and stalls at a later test at another command which just takes 1s or so (after about 34 minutes, so nothing to do with the 6 hours or so job limit).
  • on my local machine with similar configuration, I have no problems at all.
  • All other jobs are working fine.

One difference is that other jobs have their builds ccached, such that the build time is < 2min, while for this job it is still 23 mins, but I have no idea why that'd be relevant.

stumped. @casperdcl any ideas? (I could remove C++20 etc, but I can't see what this has to do with that)

@KrisThielemans KrisThielemans force-pushed the Array branch 2 times, most recently from 597b35a to ce36182 Compare February 15, 2024 09:33
@KrisThielemans
Copy link
Collaborator Author

I cannot figure out what is wrong with the gcc12-cuda0 (C++-20) job.

Looks like this as a disk-space issue! Seems resolved now.

@KrisThielemans
Copy link
Collaborator Author

gcc12-cuda0 job resolved. gcc12-cuda2.1 has a segfault in test_proj_data_info_subset, which hasn't happened before...

- make sure it works with other STIR classes, not just floats etc
- re-instate VectorWithOffset constructors that take both start and end pointers for backwards compatibility
The NumericVectorWithOffset constructor would only work when T=NUMBER.

The 1D Array constructor taking an `elemT*` wasn't implemented yet.
Seems that VC gets confused by the new Array::swap, so add std:: explicitly
make Array:init from ptr private

test_Array is crashing due to  memory bug in grow
nD Array's now store the "full" memory via shared_ptr<T[]>. This automates
memory management. 1D Array's are still TODO.

test_Array now works ok, except for 1D arrays.

Some testing code remains in test_Array
needed for shared_ptr<T[]>, unless we'd implement deleters ourselves
no longer needed as we're using a shared<T[]> and can test that.
- get rid of copy_data argument for constructors but use sensible defaults
- add VectorWithOffset::init such that Array doesn't need to know much
This was an attempt to handle pre-C++-17, but it would need
work for make_shared.  In any case, it isn't available on VS compilers
apparently.
This was previously only done when finding the CONFIG file
@KrisThielemans
Copy link
Collaborator Author

I have to rebase on master again. Sorry.

@KrisThielemans
Copy link
Collaborator Author

While more can be done here, I will merge it such that we can move on. I just need to add release notes, so anyone (@NikEfth @markus-jehl...) feels like checking, now's the time :-)

@NikEfth
Copy link
Collaborator

NikEfth commented May 22, 2024

Hi Kris, I would like to have a look, but I won't be able before the weekend.
If you need to move forward please do so.

@markus-jehl
Copy link
Contributor

Giving it a go with a python based reconstruction!

@markus-jehl
Copy link
Contributor

Works fine for me :-)

@NikEfth
Copy link
Collaborator

NikEfth commented May 22, 2024 via email

@markus-jehl
Copy link
Contributor

Not noticeably from my one test run, but I haven't done a proper timing comparison. That would take a bit longer.

@KrisThielemans KrisThielemans merged commit f3719b4 into UCL:master May 23, 2024
1 of 8 checks passed
@KrisThielemans KrisThielemans deleted the Array branch May 23, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants