Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D Feature Extraction debugging #122

Merged
merged 11 commits into from
Dec 10, 2024
Merged

3D Feature Extraction debugging #122

merged 11 commits into from
Dec 10, 2024

Conversation

nrepina
Copy link
Collaborator

@nrepina nrepina commented Aug 24, 2024

This PR addresses a few bug fixes in the Feature Extraction task:

  • Bug fix in feature extraction to handle child object masking by parent. Previously masking was performed by selecting the label of child object that corresponds to parent object. This approach works for masking of objects of the same child/parent class (e.g. organoids masked by organoids), but results in an empty label image for child/parent relations, except in the rare case that the parent object happens to have the same label as one of the children objects. Proposed fix is to load the parent object segmentation image and mask by the binary image. This solution handles masking for both child/parent relations and objects of the same class, however to reduce code modification and backwards compatibility the previous method was kept in the case that a masking label name is not supplied. Current tests pass, but should be extended to check for the child/parent masking case. @jluethi See TODO notes in task for better handling of edge cases (e.g. upscaling in case of different array shapes)
  • Improved logging and check for empty arrays.
  • Fix Anndata dtype deprecation warning.
  • Cast current region label to float prior to int conversion to handle ValueError in case of ROI label that is stored as string of float.

@jluethi Please have a look and we can decide on the best approach for these fixes before merging into main.

If ROI label stored as string of float, int() cannot convert and get error ValueError: invalid literal for int() with base 10: '1.0'
Load parent object segmentation and perform masking to select child objects belonging to parent. This fix now allows feature extraction of child objects (previously result was empty array). Also improved logging.
@nrepina nrepina requested a review from jluethi August 24, 2024 15:09
@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 0.26%. Comparing base (6ac2465) to head (4c8ebe3).
Report is 11 commits behind head on main.

Files Patch % Lines
...tiplex/fractal/scmultiplex_feature_measurements.py 0.00% 15 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##            main    #122      +/-   ##
========================================
- Coverage   0.27%   0.26%   -0.01%     
========================================
  Files         54      55       +1     
  Lines       4805    4942     +137     
========================================
  Hits          13      13              
- Misses      4792    4929     +137     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Ensures expanded labels do not cross into neighboring object.
@jluethi
Copy link
Contributor

jluethi commented Aug 24, 2024

Hey @nrepina

This looks good, great to hear that it's working now.
I looked through the tests and we really only test the one masking case where we measure on the same objects that are also our mask. So far, all measurement tests just use the original, very tiny dataset. And that test dataset only has one set of labels (only nuclei, no organoids).

We now also have the linking dataset available in the test suite that you've added @nrepina (see https://zenodo.org/records/10683087). I'll work towards getting a test on that one running, because it actually contains nuclei & organoids => we can actually test for this case with the new test dataset :)

The upsampling already seems nicely integrated, though testing this further should wait for ngio flexibility.

Thus, I think the TODO list is:

  • Create a test case that fails with old version (or with wrong selected table), but passes in new version
  • Get selection of label image to load from table attrs
  • Validate that table is masking ROI table

@jluethi
Copy link
Contributor

jluethi commented Aug 25, 2024

Turns out, using the https://zenodo.org/records/10683087 dataset for this tests won't be trivial atm (until there is a better fix for fractal-analytics-platform/fractal-tasks-core#771). The test data contains organoids that have a lower resolution label image than the nuc.

I'll give it a quick shot to generate a "psuedo-org" label image on-the-fly to be able to run those tests

@nrepina
Copy link
Collaborator Author

nrepina commented Aug 26, 2024

Hey @jluethi, great, sounds like a plan. It has been on my to-do list to update the Zenodo test dataset with an object segmentation that matches the child segmentation level. I will get to that in the coming days and this should resolve the testing case until Fractal has a better solution for upscaling :) Will ping you once it is updated.

In the updated dataset, organoid and cell segmentation both performed on same zarr level 0, which allows for testing of cell feature extraction with masking. Expected test results values updated accordingly (only for integration tests) due to minor changes in segmentation.
@nrepina
Copy link
Collaborator Author

nrepina commented Oct 23, 2024

@jluethi update of Zenodo dataset is finished! Now we should be able to cover the use case of masked single-cell feature extraction with a test dataset. I updated the scmpx integration tests, it's now ready for you to update the feature extraction tests and merge to main :)

@jluethi
Copy link
Contributor

jluethi commented Dec 6, 2024

Hey @nrepina
I'm starting work on this now. I fixed a tasks core dependency issue. There is a second test currently failing, which is an test_scmultiplex_mesh_measurements - AssertionError. Can you look into that one? The error is in the test_scmultiplex_mesh_measurements test function and I'm not familiar with the details of the mesh measurements:

def test_scmultiplex_mesh_measurements(linking_zenodo_zarrs, name=name_3d):


assert_almost_equal(
                output[0, :], test_scmultiplex_mesh_measurements_expected_output, decimal=4
            )

E           AssertionError: 
E           Arrays are not almost equal to 4 decimals
E           
E           Mismatched elements: 9 / 9 (100%)
E           Max absolute difference: 531.62355703
E           Max relative difference: 0.1233685
E            x: array([4.8409e+03, 1.4160e+03, 1.0232e+00, 4.8842e-01, 9.9331e-01,
E                  6.6914e-03, 9.3859e-03, 1.1375e+00, 1.0115e+00], dtype=float32)
E            y: array([4.3092e+03, 1.3030e+03, 1.0175e+00, 4.6225e-01, 9.9401e-01,
E                  5.9891e-03, 9.5799e-03, 1.1295e+00, 1.0087e+00])

=> the data calculated in the test is not what the test expects.

I'll still have to update the tests for the measurement to actually test for the new behavior & use the new test data.

@jluethi
Copy link
Contributor

jluethi commented Dec 6, 2024

@nrepina I now added a test to check for the new behavior. Before the fix, it only got 1 measurement from your very nice new test dataset. After the fix, it gets 20 measurements and they look reasonable. So I'd say it's working well! :)

I also simplified the task logic a bit. I now load the masking_label_name directly from the roi table metadata => no more reason to add it as an input. And that allows to simplify a few of the newly added code paths, make the checks a bit simpler and such.

Now something in the old test case appears to break with the new masking approach. I'll have to look into that a bit more early next week. I think it's related to handling of empty arrays or missing labels or such. Not 100% sure yet.

Besides that, this PR now looks good and the scmultiplex measurements now make correct & tested masked measurement. The only thing missing to merge this is fixing the test_scmultiplex_mesh_measurements - AssertionError. After that, I'd say we can merge to main :)

@jluethi
Copy link
Contributor

jluethi commented Dec 9, 2024

@nrepina I now fixed the one remaining issue coming from my refactoring. We also need to upscale the masking images, as we in principle support masking images of a lower res than the intensity image (just not when there is a non-integer upscaling factor). This is also added now. We can certainly refactor this test in the future, as it's somewhat slow (40s).

Let's figure out what goes wrong with the Mesh measurements, then this should be ready to merge

@jluethi
Copy link
Contributor

jluethi commented Dec 9, 2024

For the record: The remaining issue with the Mesh Measurement task comes down to a difference between VTK 9.3.1 vs. VTK 9.4.0. Let's see if we can figure out which version is "more correct" / what actually changed

@jluethi jluethi mentioned this pull request Dec 10, 2024
@jluethi
Copy link
Contributor

jluethi commented Dec 10, 2024

I've pinned VTK to 9.3.1 for the moment and will merge this. It will be important to look into this more (see #124)

@jluethi jluethi merged commit ee77d20 into main Dec 10, 2024
2 checks passed
@jluethi jluethi deleted the featx_debug branch December 10, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants