You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed with @annakrystalli, one of the reasons why #161 had a lot of back and forth was because there were a lot of fluid representations of terminology that turned my brain to oatmeal.
Note
This is a discussion about internal data representation in relation to the hubverse schemas.
One concept that escaped me until I had a walkthrough was a "standard" ID type representation. A standard ID type representation is a list with two elements:
required: ALL elements that must be present
optional: ANY elements that may be present
These are present in both task_ids and output_type_ids. For example, from the cdcepi/FluSight-forecast-hub are the horizon task ID and the pmf output type ID (modified to be "optional" for this example):
By having the list of required and optional, we can use the same machinery to process both task IDs and output type IDs. The optional task IDs gives modelers flexibility in what they could model against. For example, if a given location is optional and the model is not performing well, then the modeling team does not have to submit that location. For output type IDs pre-v4, it was also true that modeling teams could submit a subset of them, but this made less sense because the output type IDs are all related within a given output type ID, so excluding one makes it more difficult to handle these downstream.
The v4 switch
The switch to v4 meant that the output_type_id of all output_types now had a new child element called is_required and removes the optional element from output_type_id. First, consider this PMF output type with only optional values showing a diff between v3 (-) and v4 (+)
In pre-v4 hubValidations, all output type ids (with the exception of sample) were represented internally as a list of required and optional elements. This was implicitly the standard data representation and did not need to be explicitly stated.
However, with v4, all output types had only required elements and no optional, even if the output type was optional. To avoid writing a completely new codebase for v4, @annakrystalli had the clever idea to contextualize the list of optional and required values to be the standard type and implemented it in a series of functions:
# Create a list of NULL or NA required and optional output type id values depending
# on whether the output type is required or optional. Allows us to use current
# infrastructure to convert `NULL`s to `NA`s in a back-compatible way.
null_output_type_ids<-function(is_required) {
if (is_required) {
list(required=NA, optional=NULL)
} else {
list(required=NULL, optional=NA)
}
}
Internal documentation need
This distinction of a "standard" type id representation is a new concept and is lightly documented in the code, but it could benefit from compartmentalization. Thus, I would like to take a stab at moving all the code concerned with "standard" type ids to a separate file and rearrange some of the comments to bubble up the take-home messages. I believe this will help with maintainability and refactorability of the code in the long run.
The text was updated successfully, but these errors were encountered:
As discussed with @annakrystalli, one of the reasons why #161 had a lot of back and forth was because there were a lot of fluid representations of terminology that turned my brain to oatmeal.
Note
This is a discussion about internal data representation in relation to the hubverse schemas.
One concept that escaped me until I had a walkthrough was a "standard" ID type representation. A standard ID type representation is a list with two elements:
These are present in both
task_ids
andoutput_type_id
s. For example, from the cdcepi/FluSight-forecast-hub are thehorizon
task ID and thepmf
output type ID (modified to be "optional" for this example):By having the list of
required
andoptional
, we can use the same machinery to process both task IDs and output type IDs. The optional task IDs gives modelers flexibility in what they could model against. For example, if a given location is optional and the model is not performing well, then the modeling team does not have to submit that location. For output type IDs pre-v4, it was also true that modeling teams could submit a subset of them, but this made less sense because the output type IDs are all related within a given output type ID, so excluding one makes it more difficult to handle these downstream.The v4 switch
The switch to v4 meant that the
output_type_id
of alloutput_types
now had a new child element calledis_required
and removes theoptional
element fromoutput_type_id
. First, consider this PMF output type with only optional values showing a diff between v3 (-
) and v4 (+
)In pre-v4
hubValidations
, all output type ids (with the exception ofsample
) were represented internally as a list ofrequired
andoptional
elements. This was implicitly the standard data representation and did not need to be explicitly stated.However, with v4, all output types had only
required
elements and nooptional
, even if the output type was optional. To avoid writing a completely new codebase for v4, @annakrystalli had the clever idea to contextualize the list ofoptional
andrequired
values to be the standard type and implemented it in a series of functions:hubValidations/R/expand_model_out_grid.R
Lines 602 to 692 in 9a19893
Internal documentation need
This distinction of a "standard" type id representation is a new concept and is lightly documented in the code, but it could benefit from compartmentalization. Thus, I would like to take a stab at moving all the code concerned with "standard" type ids to a separate file and rearrange some of the comments to bubble up the take-home messages. I believe this will help with maintainability and refactorability of the code in the long run.
The text was updated successfully, but these errors were encountered: