Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] removes memory leak issue by removing c code and using numpy/scipy functions #9

Merged
merged 24 commits into from
Jan 19, 2024

Conversation

bpinsard
Copy link

@bpinsard bpinsard commented Aug 7, 2023

Fixes #6
The "optimized" c-code was not performing any faster than numpy/scipy functions.

TODO:

  • write tests (including testing that these changes do not affect results).
  • add style/linting actions
  • add actions for running tests automatically
  • add actions to deploy to pypi

@glasserm
Copy link

glasserm commented Aug 9, 2023

Is this ready for consideration? Has the patch been tested?

@bpinsard
Copy link
Author

bpinsard commented Aug 9, 2023

This is still a draft/WIP, I still need to write tests (including regression tests to assess that the results are consistent with what was previously generated).
I opened the PR to start discussions.

@coalsont
Copy link
Member

This has merge conflicts, likely due to all the style changes. It looks like nothing similar has been integrated via another PR, can you tidy this PR by not including the style changes?

@bpinsard
Copy link
Author

bpinsard commented Jan 9, 2024

I rebased removing the commit with aggressive black formatting, there are still empty spaces deletions that my editor automatically performed, but that should me more readable.
I had started to work on "regression" tests to see if scipy code was giving similar results as the custom C code, it is not exactly the same results but close enough. This required a dummy spherical harmonics coeffs file, which I tweaked from a real one that cannot be shared due to Siemens copyrights.

setup.py Outdated Show resolved Hide resolved
Copy link

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a pretty straightforward improvement. Do you have benchmarks? Or would you just time the regression test before/after the refactor?

@coalsont If this is accepted and extensions no longer need to be compiled, I will open a PR to update the CI to only build a single, pure Python wheel before you cut a new release.

gradunwarp/core/utils.py Outdated Show resolved Hide resolved
gradunwarp/core/unwarp_resample.py Outdated Show resolved Hide resolved
gradunwarp/core/unwarp_resample.py Outdated Show resolved Hide resolved
gradunwarp/core/unwarp_resample.py Outdated Show resolved Hide resolved
gradunwarp/core/unwarp_resample.py Outdated Show resolved Hide resolved
@coalsont
Copy link
Member

coalsont commented Jan 10, 2024

Looks like the test may be looking in the wrong folder for the coefficients file when run as a github action.

@effigies
Copy link

The test data doesn't get packaged by default. You should be able to add this to setup.cfg:

[options.package_data]
gradunwarp.core.tests = data/*

Note that this will increase package size from about 16k to 1.1M. We could split out the tests/ directory so that the tests and their data are packaged in the source distribution but not installed into the package. I've fiddled around with this and can open a PR against @bpinsard's branch. If that's adding too much complication to this PR, I can wait until it's done and submit another to reduce the package size.

@bpinsard
Copy link
Author

Thanks for the in depth code review, there are indeed a lot of suboptimal parts in the original code which is itself somewhat hard to read.

Seems like a pretty straightforward improvement. Do you have benchmarks? Or would you just time the regression test before/after the refactor?

I had timed it interactively and it was equivalent if not faster, but I have no actual values. It seems like the C extension were created a long time ago when scipy/numpy equivalent did not exists. We can expect numpy/scipy code to be of better quality/performance/robustness.
The approach you suggest with timing of the regression test seems the easiest way, I will give it a try.

@coalsont If this is accepted and extensions no longer need to be compiled, I will open a PR to update the CI to only build a single, pure Python wheel before you cut a new release.

Copy link

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some packaging notes.

setup.py Show resolved Hide resolved
gradunwarp/core/tests/test_regression.py Outdated Show resolved Hide resolved
@bpinsard
Copy link
Author

If you have more suggestions on making the regression test better, or understanding why using scipy legendre give difference at the 5th decimal, I am all ears. Ideally we should try to decrease that discrepancy, but maybe that's impossible.

@effigies
Copy link

float32 has 7.22 decimal digits of precision. If your values are >10, the 7th digit will be the fifth digit after the decimal, which is what assert_array_almost_equal is actually checking. We know that Krish's legendre used floats; if scipy's legendre uses doubles for some portions of the calculation, that could be a source of difference. Also, fp arithmetic is not associative, so any reordering of operations in the implementation could be to blame.

What if you try assert_allclose? That defaults to a relative tolerance of 1e-7, which is more suited to comparing the results of single-precision arithmetic.

@bpinsard
Copy link
Author

bpinsard commented Jan 10, 2024

Thanks for your input.

assert_allclose gives the following

E           AssertionError: 
E           Not equal to tolerance rtol=1e-07, atol=0
E           
E           Mismatched elements: 148042 / 216000 (68.5%)
E           Max absolute difference: 2.86567956e-05
E           Max relative difference: 0.03226674
E            x: array([[[-4.727971, -7.04162 , -8.64953 , ..., -8.649536, -7.04162 ,
E                    -4.727949],
E                   [-3.05993 , -5.125746, -6.616065, ..., -6.61608 , -5.125739,...
E            y: array([[[-4.727948, -7.041609, -8.649537, ..., -8.649537, -7.041609,
E                    -4.727948],
E                   [-3.05992 , -5.12574 , -6.616078, ..., -6.616078, -5.12574 ,...

Just the fact of not inefficiently computing cos(argcos) amplified the max discrepancy by one decimal.

I guess it depends on what it is we want to achieve, I would say we want close results, but how close do we want it.
It is not a true regression test, more like a way to check that this large refactor does not breaks badly the algorithm.

Also that displacement value in mm is to be considered in the context of MRI resolution. When interpolating data using such field it will likely result in very minor differences too.
Should we choose the threshold based on usual MRI resolution?

@effigies
Copy link

This is not my project, so I can't say what the threshold should be for an acceptable change. If I were, I think my response would be:

  1. If a user wants bit-for-bit reproducibility, they should use the same version of the software. Some change across versions is acceptable.
  2. It would be good to understand the changes piecemeal.
    • What is the change from the custom legendre() implementation vs scipy.special.lpmv(), all else held constant? Here, I suspect that the advantage of using an externally vetted implementation would outweigh a small change in results even if they exceed np.allclose()'s default tolerances.
    • For further changes, it would be nice to see they either have no effect on the output or that they can be argued for in terms of reducing the accumulation of floating point error. Removing a cos(arccos(x)) round trip is an example where I would say the error is definitively reduced. But this can only be argued in isolation; alongside a bunch of other changes, it's hard to say.

Given the suggestion that changes in displacements of ~1e-5mm are going to have a negligible impact on the actual interpolation, perhaps an additional regression test would be to correct some image. A 1mm^3 resolution MNI template, for example, would allow you to both visually assess and quantify the change in distortion. If that image is allclose(), that would help justify a more lax threshold for these specific outputs.

But all this is the call of the maintainers. @coalsont @glasserm WDYT?

@coalsont
Copy link
Member

@mharms thoughts on simplifying the gradunwarp code in future releases in a way that slightly changes the produced warpfield displacements by up to 30 nanometers?

I think that should be fine, but I'm more worried that Siemens may not be satisfied with how the dummy test coefficients file was constructed (if it is insufficiently different from the real one used as a starting point, such that it might still reveal things about the construction of their coils).

@mharms
Copy link

mharms commented Jan 12, 2024

Yes, I agree that the tolerances being discussed here seem fine, and the differences being discussed not relevant from a practical perspective.

I can't comment on the "test" coefficient file, either whether it is "insufficiently different", or conversely whether is it so different from the specification of a real gradient coil that the resulting regression test is actually not meaningful.

@bpinsard
Copy link
Author

Regarding the coefficient file, I guess we could try to make one from scratch.
Are there any references that could help understand on which base the current software was implemented?

While I was trying to reimplement the coeff parser https://github.com/Washington-University/gradunwarp/blob/master/gradunwarp/core/coeffs.py#L119 to be more robust (using regexp to read lines) I am really puzzled about this lines https://github.com/Washington-University/gradunwarp/blob/master/gradunwarp/core/coeffs.py#L138-L144 that do skip a number of coefficient in the dummy file (which has the same structure to the one from our scanner).
So, for instance, it will only start taking data at 4 A( 9, 0) in the dummy file above.
Are there any reason to do so? Did it also skip these lines for the HCP data or did HCP scanner coeff file structure was different from ours.

Thanks for your insight!

@bpinsard bpinsard force-pushed the fix/mem_leaks branch 2 times, most recently from 1ae6650 to ccfcead Compare January 12, 2024 21:34
@bpinsard
Copy link
Author

I wrote a dummier grad file, way less coefficients (low order of spherical harmonics I guess), faster to test as well.
Rewrote the git history to remove the old file to prevent any lawsuit :D .

@coalsont
Copy link
Member

That does look much less related to an actual scanner, thanks.

The history rewrite appears to have diverged at 1.2.1 and made copies of many master commits, while current master does not appear to have the coefficients file in question (and no other commit in any branch appears to create any *.grad file). Can you try to redo that from current master, rather than 1.2.1? Maybe the merge commit messed with things...you could cherry-pick just the commits you made...

@bpinsard
Copy link
Author

rebase done.

@bpinsard bpinsard marked this pull request as ready for review January 15, 2024 14:55
@coalsont
Copy link
Member

Rebase looks proper, code looks good at a glance. Looks like setup.py is still unhappy, seems to contain remnants of the C code modules.

gradunwarp/core/coeffs.py Outdated Show resolved Hide resolved
@coalsont
Copy link
Member

Looks ready to merge, but I'll give it a day for any other opinions.

@coalsont coalsont merged commit 4af3c75 into Washington-University:master Jan 19, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gradunwarp memory leak
5 participants