Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted Average #833

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

Weighted Average #833

wants to merge 37 commits into from

Conversation

philipc2
Copy link
Member

@philipc2 philipc2 commented Jul 2, 2024

Closes #826

Overview

  • Implements weighted average functionality for a UxDataArray

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@philipc2
Copy link
Member Author

philipc2 commented Jul 2, 2024

@rytam2

I've set up the boilerplate for the weighted mean functionality. This should be a good place to get started. We can run over this during today's meeting.

@philipc2
Copy link
Member Author

philipc2 commented Jul 5, 2024

@rytam2

We have fixed the issue with the quad-hexagon grid. I've added it back to the test case.

@philipc2 philipc2 added the run-benchmark Run ASV benchmark workflow label Jul 17, 2024
Copy link

github-actions bot commented Jul 17, 2024

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [b11d011] After [a0be1c1] Ratio Benchmark (Parameter)
- 445M 375M 0.84 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
- 467M 373M 0.80 mpas_ocean.Integrate.peakmem_integrate('480km')
failed 412±7μs n/a mpas_ocean.WeightedMean.time_weighted_mean_face_centered('120km')
failed 341±6μs n/a mpas_ocean.WeightedMean.time_weighted_mean_face_centered('480km')

Benchmarks that have stayed the same:

Change Before [b11d011] After [a0be1c1] Ratio Benchmark (Parameter)
375M 376M 1.00 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
375M 375M 1.00 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
400M 379M 0.95 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
1.59±0.02s 1.58±0.01s 1.00 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
224±0.9ms 223±4ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
2.04±0.02s 2.01±0.02s 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
7.90±0.3ms 8.07±0.2ms 1.02 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
3.02±0.03s 3.08±0.03s 1.02 import.Imports.timeraw_import_uxarray
674±20ms 669±7ms 0.99 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
41.9±0.6ms 42.1±0.5ms 1.01 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
1.83±0.03ms 1.82±0.03ms 0.99 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
538±10μs 557±20μs 1.03 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
1.12±0μs 1.06±0μs 0.95 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
280±1ns 270±2ns 0.96 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
770±4ns 759±6ns 0.99 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
270±1ns 257±2ns 0.95 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
432M 432M 1.00 mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('120km', False)
407M 407M 1.00 mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('120km', True)
379M 379M 1.00 mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('480km', False)
393M 377M 0.96 mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('480km', True)
1.02±0.01s 1.03±0.01s 1.01 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
53.2±0.4ms 52.7±0.4ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
78.0±0.3ms 79.3±1ms 1.02 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.50±0.2ms 5.51±0.08ms 1.00 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
319M 321M 1.01 mpas_ocean.Gradient.peakmem_gradient('120km')
296M 296M 1.00 mpas_ocean.Gradient.peakmem_gradient('480km')
2.79±0.02ms 2.79±0.06ms 1.00 mpas_ocean.Gradient.time_gradient('120km')
308±5μs 320±6μs 1.04 mpas_ocean.Gradient.time_gradient('480km')
389M 389M 1.00 mpas_ocean.Integrate.peakmem_integrate('120km')
182±5ms 177±1ms 0.97 mpas_ocean.Integrate.time_integrate('120km')
12.0±0.04ms 12.1±0.05ms 1.00 mpas_ocean.Integrate.time_integrate('480km')
342±7ms 347±4ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
348±4ms 348±2ms 1.00 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
343±3ms 344±4ms 1.00 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
22.8±0.6ms 22.7±0.1ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
22.8±0.4ms 23.0±0.2ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
22.6±0.3ms 23.0±0.2ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
56.0±0.1ms 56.5±0.5ms 1.01 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
45.7±0.2ms 45.9±0.2ms 1.01 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
360±0.8ms 361±1ms 1.00 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
266±2ms 264±0.2ms 0.99 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
294M 294M 1.00 quad_hexagon.QuadHexagon.peakmem_open_dataset
291M 291M 1.00 quad_hexagon.QuadHexagon.peakmem_open_grid
5.58±0.2ms 6.24±0.5ms ~1.12 quad_hexagon.QuadHexagon.time_open_grid

Benchmarks that have got worse:

Change Before [b11d011] After [a0be1c1] Ratio Benchmark (Parameter)
+ 6.78±0.4ms 7.93±1ms 1.17 quad_hexagon.QuadHexagon.time_open_dataset

rytam2 and others added 2 commits July 26, 2024 17:43
…rray/weighted-mean (#866)

* updated mean function with weighted arg

* updated weighted-mean functionality in dataarray.py

* edited weights to dask array

---------

Co-authored-by: Rachel Yuen Sum Tam <[email protected]>
Co-authored-by: Rachel Yuen Sum Tam <[email protected]>
@philipc2 philipc2 linked an issue Aug 12, 2024 that may be closed by this pull request
uxarray/core/dataarray.py Outdated Show resolved Hide resolved
uxarray/core/dataarray.py Outdated Show resolved Hide resolved
uxarray/core/dataarray.py Outdated Show resolved Hide resolved
@philipc2 philipc2 removed the run-benchmark Run ASV benchmark workflow label Sep 15, 2024
@philipc2 philipc2 added the run-benchmark Run ASV benchmark workflow label Oct 10, 2024
nt.assert_equal(result, expected_weighted_mean)


def test_csne30_equal_area():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rytam2

Can you write a test case for this using Dask?

  • Face Areas & Data is a dask array

weighted_mean = (self * weights).sum(axis=-1) / total_weight

# create a UxDataArray and return it
return UxDataArray(weighted_mean, uxgrid=self.uxgrid)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to preserve other parameters:

  • Coordiantes

@philipc2 philipc2 removed the run-benchmark Run ASV benchmark workflow label Nov 19, 2024
@rytam2
Copy link
Collaborator

rytam2 commented Nov 22, 2024

UXDataset support

@rytam2 rytam2 marked this pull request as ready for review December 3, 2024 17:55
@rytam2 rytam2 changed the title DRAFT: Weighted Average Weighted Average Dec 6, 2024
Copy link
Member

@aaronzedwick aaronzedwick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few suggestions.

# compute the total weight
total_weight = weights.sum()

# compute weighted mean #assumption on index of dimension (last one is geometry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# compute weighted mean #assumption on index of dimension (last one is geometry)
# compute the weighted mean, with an assumption on the index of dimension (last one is geometry)

This function calculates the weighted mean of a variable,
using the specified `weights`. If no weights are provided, it will automatically select
appropriate weights based on whether the variable is face-centered or edge-centered. If
the variable is neither face nor edge-centered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the variable is neither face nor edge-centered.
the variable is neither face nor edge-centered a warning is raised, and an unweighted mean is computed instead.

uxds = ux.open_dataset(quad_hex_grid_path, quad_hex_data_path_face_centered)

# expected weighted average computed by hand
expected_weighted_mean = 297.55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you compute this within Python? I think if you can avoid hard coding in the answer that is ideal, although I know this can't always be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this! Even if it is inevitable to use constants, they shouldn't be showing up as magic numbers within the code and instead should go into some kind of test constants

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weighted Mean Functionality
4 participants