Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Evaluator/MetricEvaluator #14

Closed
aimalz opened this issue May 26, 2023 · 7 comments
Closed

Refactor Evaluator/MetricEvaluator #14

aimalz opened this issue May 26, 2023 · 7 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@aimalz
Copy link
Collaborator

aimalz commented May 26, 2023

Riffing off @hdante's point in #11, the evaluation subpackage has some unexpected behavior that could make it unnecessarily computationally intensive. Currently the Evaluator in src/rail/evaluation is a meta-metric that evaluates all the specific metrics, and MetricEvaluator is the base class for individual metrics but lives in src/rail/evaluation/metrics. This is backwards of how creation and estimation are set up and doesn't convey that Evaluator is not the base class but actually a shortcut to running all the subclasses that constitute metrics. We should refactor this (and propagate changes through the unit tests and demos) to have the base class with the basic name outside the metrics directory and the metametric with an appropriately descriptive name in the metrics directory, mirroring the structures of creation and estimation. This will make it more straightforward for users to avoid costly metrics calculations unless they truly need them.

@eacharles
Copy link
Collaborator

eacharles commented May 26, 2023 via email

@aimalz
Copy link
Collaborator Author

aimalz commented Aug 4, 2023

@yanzastro had a very relevant thought on metrics for pipelines where true redshifts are not available, namely a pairwise difference between qp.Ensemble.ancil arrays of point estimates where rows in Ensemble could be cells in SOM, for example, as per LSSTDESC/rail_som#1, once LSSTDESC/rail_som#2 produces photo-z PDF estimates per cell.

@aimalz
Copy link
Collaborator Author

aimalz commented Aug 17, 2023

Just catching this up with notes from the retreat before I get too far into implementing it in the corresponding branch: The plan is to define classes of metrics based on what input they take (the 3 logical pairings of point values and PDFs, where estimate is always first and truth/reference is always second). This will give users the opportunity to specify the subset of metrics they want to calculate as keyword config options while still maintaining a low number of output files of metrics (one per class), with the added bonus of knowing which metrics can be accessed through each class just based on what the paired input types are. Here's an outline of what's going to go where:

  • PointToPointEvaluator: bias, scatter, "robust" versions thereof, IQR, MAD, various "outlier rate" definitions
  • ProbToProbEvaluator: KLD, KS, AD, CvM, Brier, EMD/Wasserstein
  • ProbToPointEvaluator: CDE Loss, CRPS, PIT (maybe with statistics thereof, but maybe not?)

This plan also corresponds to some way to generate point estimates from PDFs as a standalone stage, thereby overcoming the restriction that the only way to make point estimates in a pipeline right now is to do it at the time of estimation.

  • PointEstReducer (or somesuch): median, mode, mean, RBPE, sample

(Note that this functionality has some implications for the formatting of point estimates calculated by an Estimator; @drewoldag and I will outline these in detail in a design document shortly.)

@eacharles
Copy link
Collaborator

eacharles commented Aug 17, 2023 via email

@joezuntz
Copy link
Contributor

Hi all. Alex asked me about stages that can have variable numbers of inputs. I'll have a think about this and get back to you.

@ztq1996
Copy link
Contributor

ztq1996 commented Mar 21, 2024

I want to revisit this next Monday (3/25), since the codebase has changed, the functionality of the current Evaluator class in src/evaluation/evaluator.py should be fulfilled by the single evaluator class in the eac branch, and we should be freeing up the superclass, and have metric evaluator classes based on that.

@eacharles
Copy link
Collaborator

done with #98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants