Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equalise cubes #6257

Merged
merged 17 commits into from
Dec 20, 2024
Merged

Equalise cubes #6257

merged 17 commits into from
Dec 20, 2024

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented Dec 16, 2024

Closes #6248

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.85%. Comparing base (ec44731) to head (a19b409).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6257      +/-   ##
==========================================
+ Coverage   89.83%   89.85%   +0.01%     
==========================================
  Files          88       88              
  Lines       23347    23385      +38     
  Branches     4344     4357      +13     
==========================================
+ Hits        20974    21012      +38     
  Misses       1646     1646              
  Partials      727      727              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pp-mo pp-mo marked this pull request as ready for review December 17, 2024 22:41
@pp-mo pp-mo requested a review from stephenworsley December 17, 2024 22:42
@pp-mo
Copy link
Member Author

pp-mo commented Dec 17, 2024

Status update

I think this is now good enough to consider as it is, though I'm still anticipating other useful features could be added.
It's worth considering what's included here, and why, and also what is anticipated to be added in future

"Grouping" implementation

As noted here in the original issue - section "Grouping of input cubes ?",
it was realised that in order to usefully apply "equalisation" operations like equalise_attributes to all cubes in a file load
(as we eventually hope to - see here, in section "Embedding in extended "combine_cubes" operation" ),
- they must be applied over "groups" of input cubes, not all cubes from the whole file.

The notes there describe the problem of trying to rationalise the different ways in which merge and concatenate do this "grouping"
.. however .. the current implementation drastically simplifies this, by grouping based on cube.metadata only.
I think this will make "adequate" distinction between the input cube groups over which "equalisation" operations are applied;
and obviously ..

  • it works the same for both merge + concatenate
  • it is simple enough to clearly + fully document

Currently included options :

  • unify_time_units because
    • it already exists + has proved useful
  • equalise_attributes because ...
    • it already exists + has proved useful
    • it has a particular relation to cube metadata, and interaction with input 'grouping'
      --both to affect it + to be affected by it-- so this is an opportunity to sort out how that needs to function
  • unify_names because ...
    • this operation may be needed to enable netcdf data to be concatenated,
      which is more likely to be wanted now we've made that available on loading
    • it has a particular relation to cube metadata, and interaction with input 'grouping',
      so it's worth resolving + documenting how that works in the initial version of the function

Other possible, future options

See also list in original issue

  • unify selected (compatible) units, e.g.
    • unify_compatible_units=['m', 'Pa']
  • make approximately-equal coordinates equal
  • remove aux-coords, cell-methods, cell-measures etc, e.g.
    • remove_ancils=True
    • remove_cell_measures="a_cell"
  • apply 'new-axis' to ensure promote certain scalar coords to dims, with selected additional components (see Code solutions for time-dependent hybrid height #6165 for conceptual background)
    • make_axis="time"
    • make_axis={"time": "surface_altitude"}
  • remove_coord_bounds

We can also, in future, usefully include this in the "combine_cubes" operation and "COMBINE_POLICY" / "LOAD_POLICY" settings.
I anticipate that we can add a single "equalisation_kwargs" keyword which is set to a dictionary arg. This enables a single 'equalisation phase' to occur just once before the merge/concat operation.

Copy link
Contributor

@stephenworsley stephenworsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a couple things, mostly about documentation. This will also want a whatsnew describing the new function.

lib/iris/util.py Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
@pp-mo
Copy link
Member Author

pp-mo commented Dec 19, 2024

Thanks @stephenworsley

I think this is back with you now, I have attempted to address all the points raised so far.

Don't be shy of re-raising, or finding more problems : I think this is rather new territory, and I'm conscious it hasn't had a whole lot of scrutiny.
In particular, the solution is if anything a bit simpler than I initially expected, but there is also some "special pleading" about the particular operations I have chosen to implement, and the logic of the implementation. Some of it is also probably influenced by the "expectations of future extensions", which is always a bit suspect.

@pp-mo
Copy link
Member Author

pp-mo commented Dec 19, 2024

Update:

oops, forgot a whatsnew. I'll get onto that, too.

Copy link
Contributor

@stephenworsley stephenworsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I think the logic of the way you've implemented things makes sense, I think starting with a simpler implementation makes sense and this looks like it should be a pretty nice convenience in a lot of common use cases.

Just a couple more minor quibles here and I'm happy merging this.

lib/iris/tests/unit/util/test_equalise_cubes.py Outdated Show resolved Hide resolved
lib/iris/tests/unit/util/test_equalise_cubes.py Outdated Show resolved Hide resolved
@pp-mo pp-mo marked this pull request as draft December 19, 2024 23:48
@pp-mo
Copy link
Member Author

pp-mo commented Dec 19, 2024

Update

Just realized a possible problem, hence put into draft temporarily..

Somewhat as @bjlittle suggested, it might have been better to use the metadata APIs for logic.
I just now realized that I have not tested this with real metadata content. The problem with the metadata "snapshot" dictionaries I am using is that they will not support equality testing if they contain array attributes. Whereas actual metadata objects would do.

My purpose was (a) to make an independent stable copy and (b) that it should be easily modifiable, which actual metadata objects aren't. But I think now I will look into doing this with actual metadata instead.

I don't expect the code to change much, but I will add some practical testing of the array-attributes use case

@pp-mo pp-mo marked this pull request as ready for review December 20, 2024 11:35
@pp-mo
Copy link
Member Author

pp-mo commented Dec 20, 2024

Update

OK panic over, sorry about that. Please re-consider now @stephenworsley !

I added a test for array-attribute handling, and to my surprise it all just worked anyway (!)

Debugging, I quickly realised that the reason is that in the cube_grouping_values dictionaries, the 'attributes' values are deep-copies of the original CubeAttrsDicts.
So those do of course support the required comparison operation, and all works as it should.
As I originally said, the 'grouping' is only based on equality, it has no use for more sophisticated metadata behaviours.

Copy link
Contributor

@stephenworsley stephenworsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@stephenworsley stephenworsley merged commit a326a4c into SciTools:main Dec 20, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inital draft "equalise_cubes" operation to assist merge/concatenate
2 participants