Skip to content

Commit

Permalink
Feature #2924 fcst climo, PR 1 of 2 (#2939)
Browse files Browse the repository at this point in the history
* Per #2924, Update the MPR and ORANK output line types to just write duplicate existing climo values, update the header tables and MPR/ORANK documentation tables.

* Per #2924, update get_n_orank_columns() logic

* Per #2924, update the Stat-Analysis parsing logic to parse the new MPR and ORANK climatology columns.

* Per #2924, making some changes to the vx_statistics library to store climo data... but more work to come. Committing this first set of changes that are incomplete but do compile.

* Per #2924, this big set of changes does compile but make test produces a segfault for ensemble-stat

* Per #2924, fix return value for is_keeper_obs()

* Per #2924, move fcst_info/obs_info into the VxPairBase pointer.

* Per #2924, update Ensemble-Stat to set the VxPairBase::fcst_info pointer

* Per #2924 udpate handling of fcst_info and obs_info pointers in Ensemble-Stat

* Per #2924, update the GSI tools to handle the new fcst climo columns.

* Per #2924, add backward compatibility logic so that when old climo column names are requested, the new ones are used.

* Per #2924, print a DEBUG(2) log message if old column names are used.

* Per #2924, switch the unit tests to reference the updated MPR column names rather than the old ones.

* Per #2924, working progress. Not fully compiling yet

* Per #2924, another round of changes. Removing MPR:FCST_CLIMO_CDF output column. This compiles but not sure if it actually runs yet

* Per #2924, work in progress

* Per #2924, work in progress. Almost compiling again.

* Per #2924, get it compiling

* Per #2924, add back in support for SCP and CDP which are interpreted as SOCP and OCDP, resp

* Per #2924, update docs about SCP and CDP threshold types

* Per #2924, minor whitespace changes

* Per #2924, fix an uninitialized pointer bug by defining/calling SeepsClimoGrid::init_from_scratch() member function. The constructor had been calling clear() to delete pointers that weren't properly initialized to nullptr. Also, simplify some map processing logic.

* Per #2924, rename SeepsAggScore from seeps to seeps_agg for clarity and to avoid conflicts in member function implementations.

* Per #2924, fix seeps compilation error in Point-Stat

* Per #2924, fix bug in the boolean logic for handling the do_climo_cdp NetCDF output option.

* Per #2924, add missing exit statement.

* Per #2924, tweak threshold.h

* Per #2924, define one perc_thresh_info entry for each enumerated PercThreshType value

* Per #2924, simplify the logic for handling percentile threshold types and print a log message once when the old versions are still used.

* Per #2924, update the string comparison return value logic

* Per #2924, fix the perc thresh string parsing logic by calling ConcatString::startswith()

* Per #2924, switch all instances of CDP to OCDP. Gen-Ens-Prod was writing NetCDF files with OCDP in the output variable names, but Grid-Stat was requesting that the wrong variable name be read. So the unit tests failed.

* Per #2924, add more doc details

* Per #2924, update default config file to indicate when climo_mean and climo_stdev can be set seperately in the fcst and obs dictionaries.

* Per #2924, update the MET tools to parse climo_mean and climo_stdev separately from the fcst and obs dictionaries.

* Per #2924, backing out new/modified columns to minimize reg test diffs

* Per #2924, one more section to be commented out later.

* Per #2924, replace several calls to strncmp() with ConcatString::startswith() to simplify the code

* Per #2924, strip out some more references to OBS_CLIMO_... in the unit tests.

* Per #2924, delete accidental file

* Per #2924 fix broken XML comments

* Per #2924, fix comments

* Per #2924, address SonarQube findings

* Per #2924, tweak a Point-Stat and Grid-Stat unit test config file to make the output more comparable to develop.

* Per #2924, fix bug in the logic of PairDataPoint and PairDataEnsemble, when looping over the 3-dim array do not return when checking the climo and fcst values. Instead we need to continue to the next loop iteration.

* Per #2924, address more SonarQube code smells to reduce the overall number in MET for this PR.

* Per #2924, correct the logic for parsing climo data from MPR lines.

* Per #2924, cleanup grid_stat.cc source code by making calls to DataPlane::is_empty() and Grid::nxy().

* Per #2924, remove unneeded ==0
  • Loading branch information
JohnHalleyGotway authored Jul 30, 2024
1 parent 101c074 commit c593187
Show file tree
Hide file tree
Showing 90 changed files with 4,710 additions and 4,319 deletions.
8 changes: 7 additions & 1 deletion data/config/EnsembleStatConfig_default
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,8 @@ ens_phist_bin_size = 0.05;
////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
// Climatology mean data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_mean = {

Expand All @@ -149,12 +150,17 @@ climo_mean = {
hour_interval = 6;
}

//
// Climatology standard deviation data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_stdev = climo_mean;
climo_stdev = {
file_name = [];
}

//
// Climatology distribution settings
// May be set separately in each "obs.field" entry
//
climo_cdf = {
Expand Down
5 changes: 4 additions & 1 deletion data/config/GenEnsProdConfig_default
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ nmep_smooth = {
////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
// Climatology mean data
//
climo_mean = {

Expand All @@ -114,6 +114,9 @@ climo_mean = {
hour_interval = 6;
}

//
// Climatology standard deviation data
//
climo_stdev = climo_mean;
climo_stdev = {
file_name = [];
Expand Down
8 changes: 7 additions & 1 deletion data/config/GridStatConfig_default
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@ obs = fcst;
////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
// Climatology mean data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_mean = {

Expand All @@ -94,12 +95,17 @@ climo_mean = {
hour_interval = 6;
}

//
// Climatology standard deviation data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_stdev = climo_mean;
climo_stdev = {
file_name = [];
}

//
// Climatology distribution settings
// May be set separately in each "obs.field" entry
//
climo_cdf = {
Expand Down
8 changes: 7 additions & 1 deletion data/config/PointStatConfig_default
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,8 @@ message_type_group_map = [
////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
// Climatology mean data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_mean = {

Expand All @@ -137,12 +138,17 @@ climo_mean = {
hour_interval = 6;
}

//
// Climatology standard deviation data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_stdev = climo_mean;
climo_stdev = {
file_name = [];
}

//
// Climatology distribution settings
// May be set separately in each "obs.field" entry
//
climo_cdf = {
Expand Down
11 changes: 10 additions & 1 deletion data/config/SeriesAnalysisConfig_default
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ obs = fcst;
////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
// Climatology mean data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_mean = {

Expand All @@ -80,11 +81,19 @@ climo_mean = {
hour_interval = 6;
}

//
// Climatology standard deviation data
// May be set separately in the "fcst" and "obs" dictionaries
//
climo_stdev = climo_mean;
climo_stdev = {
file_name = [];
}

//
// Climatology distribution settings
// May be set separately in each "obs.field" entry
//
climo_cdf = {
cdf_bins = 1;
center_bins = FALSE;
Expand Down
149 changes: 91 additions & 58 deletions docs/Users_Guide/config_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ The configuration file language supports the following data types:
* Percentile Thresholds:

* A threshold type (<, <=, ==, !=, >=, or >), followed by a percentile
type description (SFP, SOP, SCP, USP, CDP, or FBIAS), followed by a
numeric value, typically between 0 and 100.
type description (SFP, SOP, SFCP, SOCP, USP, FCDP, OCDP, or FBIAS),
followed by a numeric value, typically between 0 and 100.

* Note that the two letter threshold type abbreviations (lt, le, eq, ne,
ge, gt) are not supported for percentile thresholds.
Expand Down Expand Up @@ -93,8 +93,14 @@ The configuration file language supports the following data types:
* "SOP" for a percentile of the sample observation values.
e.g. ">SOP75" means greater than the 75-th observation percentile.

* "SCP" for a percentile of the sample climatology values.
e.g. ">SCP90" means greater than the 90-th climatology percentile.
* "SFCP" for a percentile of the sample forecast climatology values.
e.g. ">SFCP90" means greater than the 90-th forecast climatology
percentile.

* "SOCP" for a percentile of the sample observation climatology values.
e.g. ">SOCP90" means greater than the 90-th observation climatology
percentile. For backward compatibility, the "SCP" threshold type
is processed the same as "SOCP".

* "USP" for a user-specified percentile threshold.
e.g. "<USP90(2.5)" means less than the 90-th percentile values which
Expand All @@ -109,40 +115,59 @@ The configuration file language supports the following data types:
the observations and then chooses a forecast threshold which results in
a frequency bias of 1. The frequency bias can be any float value > 0.0.

* "CDP" for climatological distribution percentile thresholds.
These thresholds require that the climatological mean and standard
deviation be defined using the climo_mean and climo_stdev config file
options, respectively. The categorical (cat_thresh), conditional
(cnt_thresh), or wind speed (wind_thresh) thresholds are defined
relative to the climatological distribution at each point. Therefore,
the actual numeric threshold applied can change for each point.
e.g. ">CDP50" means greater than the 50-th percentile of the
* "FCDP" for forecast climatological distribution percentile thresholds.
These thresholds require that the forecast climatological mean and
standard deviation be defined using the "climo_mean" and "climo_stdev"
config file options, respectively. The categorical (cat_thresh),
conditional (cnt_thresh), or wind speed (wind_thresh) thresholds can
be defined relative to the climatological distribution at each point.
Therefore, the actual numeric threshold applied can change for each point.
e.g. ">FCDP50" means greater than the 50-th percentile of the
climatological distribution for each point.

* When percentile thresholds of type SFP, SOP, SCP, or CDP are requested
for continuous filtering thresholds (cnt_thresh), wind speed thresholds
(wind_thresh), or observation filtering thresholds (obs_thresh in
ensemble_stat), the following special logic is applied. Percentile

* "OCDP" for observation climatological distribution percentile thresholds.
The "OCDP" threshold logic matches the "FCDP" logic described above.
However these thresholds are defined using the observation climatological
mean and standard deviation rather than the forecast climatological data.
For backward compatibility, the "CDP" threshold type is processed the
same as "OCDP".

* When percentile thresholds of type SFP, SOP, SFCP, SOCP, FCDP, or OCDP are
requested for continuous filtering thresholds (cnt_thresh), wind speed
thresholds (wind_thresh), or observation filtering thresholds (obs_thresh
in ensemble_stat), the following special logic is applied. Percentile
thresholds of type equality are automatically converted to percentile
bins which span the values from 0 to 100.
For example, "==CDP25" is automatically expanded to 4 percentile bins:
>=CDP0&&<CDP25,>=CDP25&&<CDP50,>=CDP50&&<CDP75,>=CDP75&&<=CDP100
For example, "==OCDP25" is automatically expanded to 4 percentile bins:
>=OCDP0&&<OCDP25,>=OCDP25&&<OCDP50,>=OCDP50&&<OCDP75,>=OCDP75&&<=OCDP100

* When sample percentile thresholds of type SFP, SOP, SCP, or FBIAS are
requested, MET recomputes the actual percentile that the threshold
* When sample percentile thresholds of type SFP, SOP, SFCP, SOCP, or FBIAS
are requested, MET recomputes the actual percentile that the threshold
represents. If the requested percentile and actual percentile differ by
more than 5%, a warning message is printed. This may occur when the
sample size is small or the data values are not truly continuous.

* When percentile thresholds of type SFP, SOP, SCP, or USP are used, the
actual threshold value is appended to the FCST_THRESH and OBS_THRESH
* When percentile thresholds of type SFP, SOP, SFCP, SOCP, or USP are used,
the actual threshold value is appended to the FCST_THRESH and OBS_THRESH
output columns. For example, if the 90-th percentile of the current set
of forecast values is 3.5, then the requested threshold "<=SFP90" is
written to the output as "<=SFP90(3.5)".

* When parsing FCST_THRESH and OBS_THRESH columns, the Stat-Analysis tool
ignores the actual percentile values listed in parentheses.


.. note::

Prior to MET version 12.0.0, forecast climatological inputs were not
supported. The observation climatological inputs were used to process
threshold types named "SCP" and "CDP".

For backward compatibility, the "SCP" threshold type is processed the same
as "SOCP" and "CDP" the same as "OCDP".

Users are encouraged to replace the deprecated "SCP" and "CDP" threshold
types with the updated "SOCP" and "OCDP" types, respectively.

* Piecewise-Linear Function (currently used only by MODE):

* A list of (x, y) points enclosed in parenthesis ().
Expand Down Expand Up @@ -1448,8 +1473,11 @@ climo_mean
----------

The "climo_mean" dictionary specifies climatology mean data to be read by the
Grid-Stat, Point-Stat, Ensemble-Stat, and Series-Analysis tools. It consists
of several entires defining the climatology file names and fields to be used.
Grid-Stat, Point-Stat, Ensemble-Stat, and Series-Analysis tools. It can be
set inside the "fcst" and "obs" dictionaries to specify separate forecast and
observation climatology data or once at the top-level configuration file
context to use the same data for both. It consists of several entries defining
the climatology file names and fields to be used.

* The "file_names" entry specifies one or more file names containing
the gridded climatology data to be used.
Expand Down Expand Up @@ -1506,19 +1534,22 @@ climo_stdev

The "climo_stdev" dictionary specifies climatology standard deviation data to
be read by the Grid-Stat, Point-Stat, Ensemble-Stat, and Series-Analysis
tools. The "climo_mean" and "climo_stdev" data define the climatological
distribution for each grid point, assuming normality. These climatological
distributions are used in two ways:
tools. It can be set inside the "fcst" and "obs" dictionaries to specify
separate forecast and observation climatology data or once at the top-level
configuration file context to use the same data for both. The "climo_mean" and
"climo_stdev" data define the climatological distribution for each grid point,
assuming normality. These climatological distributions are used in two ways:

(1)
To define climatological distribution percentile (CDP) thresholds which
can be used as categorical (cat_thresh), continuous (cnt_thresh), or wind
speed (wind_thresh) thresholds.
To define climatological distribution percentiles thresholds (FCDP and
OCDP) which can be used as categorical (cat_thresh), continuous (cnt_thresh),
or wind speed (wind_thresh) thresholds.

(2)
To subset matched pairs into climatological bins based on where the
observation value falls within the climatological distribution. See the
"climo_cdf" dictionary.
observation value falls within the observation climatological distribution.
See the "climo_cdf" dictionary. Note that only the observation climatology
data is used for this purpose, not the forecast climatology data.

This dictionary is identical to the "climo_mean" dictionary described above
but points to files containing climatological standard deviation values
Expand All @@ -1535,11 +1566,12 @@ over the "climo_mean" setting and then updating the "file_name" entry.
climo_cdf
---------

The "climo_cdf" dictionary specifies how the the climatological mean
("climo_mean") and standard deviation ("climo_stdev") data are used to
The "climo_cdf" dictionary specifies how the the observation climatological
mean ("climo_mean") and standard deviation ("climo_stdev") data are used to
evaluate model performance relative to where the observation value falls
within the climatological distribution. This dictionary consists of the
following entries:
within the observation climatological distribution. It can be set inside the
"obs" dictionary or at the top-level configuration file context. This
dictionary consists of the following entries:

(1)
The "cdf_bins" entry defines the climatological bins either as an integer
Expand All @@ -1553,11 +1585,11 @@ following entries:

(4) The "direct_prob" entry may be set to TRUE or FALSE.

MET uses the climatological mean and standard deviation to construct a normal
PDF at each observation location. The total area under the PDF is 1, and the
climatological CDF value is computed as the area of the PDF to the left of
the observation value. Since the CDF is a value between 0 and 1, the CDF
bins must span that same range.
MET uses the observation climatological mean and standard deviation to
construct a normal PDF at each observation location. The total area under the
PDF is 1, and the climatological CDF value is computed as the area of the PDF
to the left of the observation value. Since the CDF is a value between 0 and 1,
the CDF bins must span that same range.

When "cdf_bins" is set to an array of floats, they explicitly define the
climatological bins. The array must begin with 0.0 and end with 1.0.
Expand Down Expand Up @@ -1601,20 +1633,21 @@ all pairs into a single climatological bin.
climate_data
------------

When specifying climatology data for probability forecasts, either supply a
probabilistic "climo_mean" field or non-probabilistic "climo_mean" and
"climo_stdev" fields from which a normal approximation of the climatological
probabilities should be derived.

When "climo_mean" is set to a probability field with a range of [0, 1] and
"climo_stdev" is unset, the MET tools use the "climo_mean" probability values
directly to compute Brier Skill Score (BSS).
When specifying observation climatology data to evaluate probability
forecasts, either supply a probabilistic observation "climo_mean" field or
non-probabilistic "climo_mean" and "climo_stdev" fields from which a normal
approximation of the observation climatological probabilities should be
derived.

When "climo_mean" and "climo_stdev" are both set to non-probability fields,
the MET tools use the mean, standard deviation, and observation event
threshold to derive a normal approximation of the climatological
probabilities.
When the observation "climo_mean" is set to a probability field with a range
of [0, 1] and "climo_stdev" is unset, the MET tools use the "climo_mean"
probability values directly to compute Brier Skill Score (BSS).

When the observation "climo_mean" and "climo_stdev" are both set to
non-probability fields, the MET tools use the mean, standard deviation, and
observation event threshold to derive a normal approximation of the
observation climatological probabilities.

The "direct_prob" option controls the derivation logic. When "direct_prob" is
true, the climatological probability is computed directly from the
Expand Down Expand Up @@ -1697,7 +1730,7 @@ Point-Stat and Ensemble-Stat, the reference time is the forecast valid time.

mask
---

The "mask" entry is a dictionary that specifies the verification masking
regions to be used when computing statistics. Each mask defines a
geographic extent, and any matched pairs falling inside that area will be
Expand Down Expand Up @@ -3759,7 +3792,7 @@ obs_prepbufr_map
Default mapping for PREPBUFR. Replace input BUFR variable names with GRIB
abbreviations in the output. This default map is appended to obs_bufr_map.
This should not typically be overridden. This default mapping provides
backward-compatibility for earlier versions of MET which wrote GRIB
backward compatibility for earlier versions of MET which wrote GRIB
abbreviations to the output.

.. code-block:: none
Expand Down
Loading

0 comments on commit c593187

Please sign in to comment.