Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: address masking issues identified in #44 #45

Merged
merged 15 commits into from
Dec 17, 2024
Merged

Conversation

ceholden
Copy link
Collaborator

@ceholden ceholden commented Dec 6, 2024

Description

This PR is intended to address issues raised in #44, specifically,

  1. We should be masking 0% reflectance as invalid (we had already masked <0) ("zero reflectance should be filtered out as well")
  2. We should not mask >100% reflectance as these values are often "valid" but >100% because of unmet assumptions about topography (see, Current issues with VI #44 (comment))
  3. We should use -19_999 as a nodata value instead of -9999 because -9999 inside the valid range of normalized differenced indexes (which when scaled ranged from [-10_000, 10_000]. This -19_999 was picked for consistency with the Landsat on-demand vegetation index products (e.g., Landsat NDVI)
  4. Use the union of surface reflectance masks for all bands ("Pixel with negative or zero reflectance in any band should have no_data for all VIs")
  5. We should avoid numerical computation issues when converting from float64 -> int16, specifically overflows that create a "wrap around" when downcasting.

How I did it

  1. Change from masked_outside to masked_less_equal and set <= 0
  2. Change from masked_outside to masked_less_equal, which no longer masks >100% reflectance
  3. Define fill_value per Index enum member, set this fill value when calculating the index, and write the GeoTIFF with the fill value (-19_999)
  4. Union the nodata masks by "or"-ing the nodata mask of all bands
  5. See below,

For the issue with wraparound when downcasting, the example in test data was for EVI which can exceed [-10,000, 10,000] because it is not normalized. Consider this example from one of the granules in the test dataset,

# without clipping our large positive number becomes negative, which is misleading
>>> np.float64(192_583.3333).astype(np.int16)
-4025

# if we clip first the value is still a large positive number
>>> np.clip(np.float64(192_583.333), a_min=np.iinfo("int16").min, a_max=np.iinfo("int16").max).astype(np.int16)
32767

How you can test it

  • I added a specific unit test for (4), "mask bands by union of value range masks from all bands"
  • I added a test for all masking related requirements that I'm aware of and test each one as a separate parameterized test run
  • I updated the TIF files we keep for the "expected data" so existing unit tests shoudl work

@ceholden ceholden marked this pull request as ready for review December 6, 2024 21:48
@ceholden ceholden changed the title fix: address issues identified in #44 fix: address masking issues identified in #44 Dec 6, 2024
Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I really like the additional tests. I have only 1 completely ignorable comment that's perhaps a bit pedantic (not pydantic). Feel free to skip it.

hls_vi/generate_indices.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

@ceholden ceholden merged commit 2856231 into main Dec 17, 2024
1 check passed
@ceholden ceholden deleted the ceh/issue44-vi-fix branch December 17, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants