Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion matrix in terms of map area proportions instead of sample size #381

Merged
merged 32 commits into from
Feb 26, 2024

Conversation

hannah-rae
Copy link
Contributor

@hannah-rae hannah-rae commented Jan 26, 2024

Currently, our intercomparison and other metrics are being computed based on the confusion matrix expressed in terms of sample size. However, the recommendation in Stehman & Foody 2019 is to compute metrics from the confusion matrix expressed in terms of map area proportions of each class (which is what we do for area estimation in src/area_utils.py see lines 563-567 in that file).

This PR adds a new function to compute the mapped area for each class and use these totals to compute the "area matrix". There are a few issues to be addressed still:

  • I count the number of pixels in each class by summing the binary values (e.g., if cropland = 1 and noncrop = 0, then the sum of the map is equal to the number of cropland pixels). However, the sum returned by the ee Reducer is a float, while I would expect an integer. [SOLVED] This is because the reduction is weighted by default, so that partial pixels resulting from the clip() operation, e.g., are partial. I changed this to use an unweighted reduction because this is how we do it in our area estimation pipeline too and integers seem more transparent to me. The difference is negligible between the two anyway.
  • Getting the number of pixels for the ensemble in a way that lets us be flexible about which map(s) we specify to use for the ensemble is tricky. I am still thinking about the best way to do this. [SOLVED]
  • The F1 scores for the Rwanda test case do not look right. I am still investigating this. [SOLVED] This was because the F1 score (and some of the standard errors) were computed using the sample metrics from the sklearn classification report, but others were computed using the population error matrix.
  • Update all of the notebooks under maps

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@hannah-rae hannah-rae self-assigned this Jan 26, 2024
@hannah-rae hannah-rae added the enhancement New feature or request label Jan 26, 2024
.get("crop")
.getInfo()
)

Copy link
Collaborator

@ivanzvonkov ivanzvonkov Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an alternative to getting the crop/non-crop proportion without the need for two reduceRegion calls.
It also may be more "accurate" since the proportions are computed using area vs pixel count.

geom = aoi.geometry()
area = geom.area()
crop_proportion = binary_image \
    .multiply(ee.Image.pixelArea()) # Multiplies each pixel by its area
    .divide(area)                   # Divides each pixel by the total area
    .reduceRegion(
        reducer=ee.Reducer.sum(), 
        geometry=geom, 
        scale=self.resolution, 
        maxPixels=1e12,
        bestEffort=True        # Should reduce scale if computation times out
    ).get("crop").getInfo()

w_j = np.array([1-crop_proportion, crop_proportion])
return w_j

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be multiplied by area to get a_j

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind I see you need the exact amount of pixels for the producer variance estimator 🤦

if precision + recall == 0:
return 0
f1_score = 2 * (precision * recall) / (precision + recall)
return f1_score
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Copy link
Collaborator

@ivanzvonkov ivanzvonkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent! Looks like worldcover got a bit of a boost from these area weights. Thank you for updating the template notebook and the descriptions in each notebook.

@hannah-rae hannah-rae merged commit 47cb0e5 into master Feb 26, 2024
8 checks passed
@hannah-rae hannah-rae deleted the stderr-fix branch February 26, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants