Important Maintenance: improve gather_dimensions #715

rob-taggart · 2024-10-28T04:05:35Z

A number of improvements requested:

The gather_dimensions docstring needs updating. Current arguments are iterables, but the docstring indicates they are xarray objects.
Improve handling of weights argument. At the moment if weights=None then we cannot pass on weights.dims to gather_dimensions. So we need to use the following logic prior to every gather_dimensions call: if weights is not None then pass on weights.dims else pass on None. It would be good if this logic was handled by gather_dimensions itself. Probably the easiest solution is to allow gather_dimensions to accept either iterables or xarray objects as inputs (at least for weights if not for fcst or obs).
gather_dimensions has a kwarg score_specific_fcst_dims. It would be good to have an analogous kwarg score_specific_obs_dims. I have at least one use case for it in a new metric I am writing. I don't think this concept needs to be expanded beyond fcsts and obs.
Code for many (most? all?) legacy metrics don't model calling gather_dimensions early as part of arg checks. It would be good to fix this up so that good coding is modelled well for new metrics. This issue has already been noted: Ensure gather_dimensions occurs early in score calculation APIs #299

@tennlee , @nicholasloveday

The text was updated successfully, but these errors were encountered:

nikeethr · 2024-10-30T03:01:42Z

All the improvements sound good.

is valid, but it will never be perfect. Optional types a) necessitate branches b) can be reduced to their base value only if there is some sort of unifying operation that works for both null and "some value" consistently. Either that or the stack permanently branches to either handling null or handling some value. Otherwise, its unavoidable to have these comparisons here and there. I'm not familiar enough with the code to comment on it further though.

nicholasloveday · 2024-10-30T03:20:32Z

Regarding 2. Currently before calling gather_dimensions, to get it to work when weights could be None or an xarray object, you need to have something like the following (as per the crps_for_ensemble function).

weights_dims = None
if weights is not None:
    weights_dims = weights.dims

I agree with Rob that instead of repeating this logic everywhere, it should just be implemented once within gather_dimensions.

Note that most metrics aren't even using the weights_dims arg in gather_dimensions when they should be. I think this was an oversight from back when gather_dimensions and gather_dimensions2 were merged into the one function.

tennlee · 2024-10-30T04:03:22Z

Regarding (2), I disagree that it should just be in gather_dimensions, but of course I'm open to discussing the pros and cons. This code isn't user-facing code, so the focus should be on testatibility and repeatability. It is substantially easier to write well-targetted unit tests if the weights_dims are passed in separately rather than being extracted from the xarray. This is because the tests for the actual weights logic can then be written using strings and simple objects that are easy to construct, without any risk of side-effects. While it's slightly longer calling code as a result, it's not user-facing calling code so I think it's reasonable to put the responsibility inside the score rather than in gather_dimensions. The code in the score could possible be simplified further to a shorter pattern. I could look into it, but I don't think it's so hard to have that code appear. One approach could be to write a simple function called extract_weights_dim containing the if statement, then all the calling code could pass extract_weights_dim(xarray) saving two lines of code.

nikeethr · 2024-10-30T04:15:19Z

I agree with Rob that instead of repeating this logic everywhere, it should just be implemented once within gather_dimensions.

Again not familiar with the code, so someone else may answer this - the question is: Does the code need to retain the information that weights is an optional type OR does everything past gather_dimensions only depend on the results of gather_dimensions (i.e. weights is consumed and implicit in the result)?

If it's the former, branching is unavoidable since that is by design of the type - otherwise don't use an optional at all. If its the latter then all metrics should use gather_dimensions consistently; in order for this to be guaranteed gather_dimensions should be handled much higher level prior to dispatching calls to metrics; e.g. as some kind of decorator or otherwise a boxed structure/mixin that encompasses this paradigm i.e. (gather [immutable] -> compute [metric dependent]). But the latter will only work if gather_dimensions can guarantee type purity. (i.e output type/structure is consistent for all metrics and doesn't re-introduce branching.)

The question isn't about if the checks should be in gather_dimensions, it should, but imo without sufficient abstraction, it may also need to be outside it.

tennlee changed the title ~~improve gather_dimensions~~ Internal Improvement: improve gather_dimensions Nov 5, 2024

tennlee changed the title ~~Internal Improvement: improve gather_dimensions~~ Important Maintenance: improve gather_dimensions Nov 5, 2024

rob-taggart mentioned this issue Nov 19, 2024

Brier for ensemble with fair brier correction #735

Merged

23 tasks

tennlee added the refactoring label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Important Maintenance: improve gather_dimensions #715

Important Maintenance: improve gather_dimensions #715

rob-taggart commented Oct 28, 2024

nikeethr commented Oct 30, 2024

nicholasloveday commented Oct 30, 2024

tennlee commented Oct 30, 2024

nikeethr commented Oct 30, 2024 •

edited

Loading

Important Maintenance: improve gather_dimensions #715

Important Maintenance: improve gather_dimensions #715

Comments

rob-taggart commented Oct 28, 2024

nikeethr commented Oct 30, 2024

nicholasloveday commented Oct 30, 2024

tennlee commented Oct 30, 2024

nikeethr commented Oct 30, 2024 • edited Loading

nikeethr commented Oct 30, 2024 •

edited

Loading