Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: PERF: Use NeighborhoodRange for metric image computation #98

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thewtex
Copy link
Collaborator

@thewtex thewtex commented Oct 17, 2018

Depends on:

http://review.source.kitware.com/#/c/23795/4

Currently 4X slower

Closes #97

@thewtex
Copy link
Collaborator Author

thewtex commented Oct 17, 2018

CC: @N-Dekker @phcerdan

Experimenting with the proposed neighborhood range constant pixel access policy. 🔬 Currently, this seems to be quite a bit slower, though. 🐌

@phcerdan
Copy link

Hey @thewtex, what script/test are you using for testing the performance?

@thewtex
Copy link
Collaborator Author

thewtex commented Oct 18, 2018

@phcerdan itkBlockMatchingBayesianRegularizationDisplacementCalculatorTest.

using RangeType = Experimental::ShapedImageNeighborhoodRange<const MetricImageType,
Experimental::ConstantBoundaryImageNeighborhoodPixelAccessPolicy<const MetricImageType> >;
const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets };
const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really intended that the kernel is set on a new location (denomIt.GetIndex()) with every iteration?

const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets };
const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets };

const MetricImagePixelType normXcorr = std::inner_product( movingRange.begin(), movingRange.end(), kernelRange.begin(), 0.0 ) / denomIt.Get();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I could not get optimal performance when using std::inner_product at https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Core/ImageFunction/include/itkGaussianDerivativeImageFunction.hxx#L219, so instead I just manually wrote the inner product calculation in a few lines of code. In this case, it might look as follows:

auto normXcorr = NumericTraits<MetricImagePixelType>::Zero;
auto movingRangeIterator = movingRange.cbegin();

for (const auto& kernelValue : kernelRange)
{
  normXcorr += kernelValue * (*movingRangeIterator);
  ++movingRangeIterator;
}
normXcorr /= denomIt.Get();

@N-Dekker
Copy link

N-Dekker commented Oct 18, 2018

This use case might suggest adding ShapedImageNeighborhoodRange::SetLocation(const IndexType&), so that the ranges could be declared and initialized outside the loop, and then have their location updated inside the loop, by calling range.SetLocation(denomIt.GetIndex()) for each range.

Update: The member function is added with patch set 5: http://review.source.kitware.com/#/c/23795/5

@thewtex
Copy link
Collaborator Author

thewtex commented Oct 18, 2018

Great ideas @N-Dekker ! 💡

And thanks for adding the SetLocation patch to move reconstruction out of the loop. I will try that and manually expanding the inner product. This will probably happen next week.

On a related note, I was examining the patch and the existing code, and the construction of the proxy in operater*():

https://github.com/InsightSoftwareConsortium/ITK/blob/a5d40b033f809b8d54e0bda9bd1eb5e7613faf69/Modules/Core/Common/include/itkShapedImageNeighborhoodRange.h#L348-L351

Do you think the construction here has performance implications and would it be avoidable?

@N-Dekker
Copy link

N-Dekker commented Oct 18, 2018

I was examining the patch and the existing code, and the construction of the proxy in operater*():
Do you think the construction here has performance implications and would it be avoidable?

The use of a proxy as return type of operater*() is hard to avoid, for a constant-boundary iterator. Because operater*() cannot return a 'real' reference to an existing pixel from the image, when trying to retrieve a pixel value outside the image boundaries.

However, looking at http://review.source.kitware.com/#/c/23795/5/Modules/Core/Common/include/itkConstantBoundaryImageNeighborhoodPixelAccessPolicy.h my intuition tells me that it might be possible to squeeze some CPU cycles out of the private helper function CalculatePixelIndexValue, which is called with every operater*(). CalculatePixelIndexValue has an if statement and a return inside its for loop which might have performance implications... but I'm not sure! The compiler optimizer might be smarter than my intuition!

Update: With patch set 7 I adjusted the private helper function CalculatePixelIndexValue, removing the return from inside its for loop, and indeed, it appears to significantly improve the performance! Using VS2017 64-bit Release, I observed a reduction of more than 30% of the run-time duration!

@N-Dekker
Copy link

Experimenting with the proposed neighborhood range constant pixel access policy. 🔬 Currently, this seems to be quite a bit slower, though. 🐌

@thewtex Which compiler do you use? (Release build, optimized for speed?) I'm asking because I observed significant PERF differences between VS2015 and VS2017 (both 64-bit Release), while running the example code that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27

@thewtex
Copy link
Collaborator Author

thewtex commented Oct 21, 2018

@N-Dekker this was Clang / MinSizeRel build. I will try other compilers and other build configurations, along with your example code!

@N-Dekker
Copy link

N-Dekker commented Oct 22, 2018

@thewtex I'm interested to see your results! Using VS2017 Release, I found that NeighborhoodRange based iteration was almost 3x faster than the old-school itk::ConstNeighborhoodIterator. Unfortunately, I have bad news... or is it good news?!? With the ConstNeighborhoodIterator patch http://review.source.kitware.com/#/c/23813/ that I just posted, the old-school iteration actually beats the NeighborhoodRange!!! At least when using VS2017 Release, on the example that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27 However, there are still some good reasons to consider NeighborhoodRange:

  • The range class supports very fast change of location to an arbitrary position, especially when we add ShapedImageNeighborhoodRange::SetLocation (while the old itk::ConstNeighborhoodIterator::SetLocation is clearly "not optimized for speed").
  • It is thread-safe (wheras the old itk::ConstNeighborhoodIterator has unsafe mutable data).
  • It has a modern C++ interface.

@thewtex
Copy link
Collaborator Author

thewtex commented Nov 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants