Support v3.0.0 schema sample specification #19

annakrystalli · 2024-04-29T13:57:32Z

This PR will:

Add back compatible dynamic validation of sample output_type_id_params in validate_confi(). Specifically, check min_ is less than os eual to and that compund_taskid set is match valid model task task IDs. Resolves Dynamic validations of sample output types #17
Breaking change: Update create_output_type_sample() to handle v3.0.0 schema. In particular, create_output_type_sample() now takes arguments incompatible with previous schema versions and returns an object with an output_type_id_params object instead of output_type_id. Resolves Add new create_output_type_sample() to handle new v3.0.0 schema specification #18
Re-direct sample output type tests to latest on main branch when add v3.0.0 schema with initial sample schema schemas#72 merged
Test won't pass until schema released

For further details and context see here

github-actions · 2024-04-29T14:04:11Z

🚀 Deployed on https://6672872e3bf94cce5b4ac50b--hubadmin-pr-previews.netlify.app

…18

R/utils.R

R/create_output_type_item.R

elray1 · 2024-05-06T14:18:10Z

R/validate-config-utils.R

+# Validate that compound_taskid_set values are valid task ids for a
+# given modeling task group in a given round.
+# Returns NULL if not applicable or check passes and error df row if check fails.
+validate_mt_sample_comp_tids <- function(model_task_grp,
+                                         model_task_i,
+                                         round_i,
+                                         schema) {


Questions about what's going on here and in check_compound_taskids_valid:

The idea is that check_compound_taskids_valid is used when creating a new config file, but this function is used when validating an existing config file?

There may be value in standardizing on naming conventions:

it's not clear to me why we've used validate here and check in the name of the other function. The check function is actually aborting, while this one generates an error message in a data frame that is collected for later use? Are there standard conventions for this kind of thing? I went and looked at checkmate docs, but they don't have things starting with validate.

noting comp_tids here vs. compound_taskids in the other function name

would there be any value or possibility to refactor the check logic here and in that other function? It's not much code and it's not exactly identical, so maybe not. But it is quite similar.

So there's two places that dynamic validations that can't be encoded in our schema need to be performed:

when validating a tasks.json file (validate_mt_sample_comp_tids()). I used validate here because it is part of the config validation process which itself is based on the concept of using json schema to validate json data, powered by the package jsonvalidate. The output of the validate function conform to the output of the jsonvalidate::json_validate(), specifically the errors attribute tbl which can then be passed to view_config_val_errors().

when checking inputs to functions that create config programmatically. Because we are checking inputs to a function effectively, here I opted for check. e.g. a la rlang:: check_rquired() etc which indeed is designed to throw an error if the condition is not met.

I believe different languages have related but slightly different conventions (e.g. python uses assert in testing while we use expect) and I don't know of any resource is R that dictate the use of specific verbs for specific situations (I believe the checkmate convention is just their own internal convention). As such I've just tried to keep these internally consistent to separate functions that are used for validation of json vs functions used to check inputs to functions generating config. I've also kept such functions separate because although they might be checking the same thing, the inputs and outputs/side effects of each function are so different and take up so much of the logic in each function that trying to shoehorn them into a single function didn't make sense to me.

thanks! all sounds good. the one remaining question then is about comp_tids vs. compound_taskids

Fixed in 33a4859

tests/testthat/_snaps/create_model_task.md

tests/testthat/_snaps/create_output_type_item.md

elray1

Thanks! This looks good, I have made a couple of minor comments.

Additionally, I noticed that on lines 146-150 of R/create_output_type_item.R, we have the following docstring with some duplication (github didn't let me comment on this or suggest changes as part of this review because this PR didn't introduce this, but I thought maybe we could just clean it up here):

#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.
#' output type.
#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.

Co-authored-by: Evan Ray <[email protected]>

codecov · 2024-05-07T08:26:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.92%. Comparing base (1194ab2) to head (3399d17).
Report is 4 commits behind head on main.

❗ Current head 3399d17 differs from pull request most recent head 9a7afa3. Consider uploading reports for the commit 9a7afa3 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #19      +/-   ##
==========================================
+ Coverage   87.23%   87.92%   +0.69%     
==========================================
  Files          20       21       +1     
  Lines        1637     1830     +193     
==========================================
+ Hits         1428     1609     +181     
- Misses        209      221      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

annakrystalli · 2024-05-07T08:39:52Z

Thanks! This looks good, I have made a couple of minor comments.

Additionally, I noticed that on lines 146-150 of R/create_output_type_item.R, we have the following docstring with some duplication (github didn't let me comment on this or suggest changes as part of this review because this PR didn't introduce this, but I thought maybe we could just clean it up here):
#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.
#' output type.
#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.

Thanks! Fixed this in f026d6c

…skids

…f basic schema validation passes

…dynamic-vals-note 21/22/ Add note about two stage validation / remove hard coded location data

Co-authored-by: Lucie Contamin <[email protected]>

Rename org name

Add dynamic validation of sample output_type_id_params. Resolves #17

625847e

annakrystalli marked this pull request as draft April 29, 2024 13:57

annakrystalli added 5 commits April 29, 2024 17:12

build ignore attic

67fb4c7

try macos-12 runner

619a059

add extract_version_n function

a05815d

Add support for v3.0.0 samples when creating config objects. Resolves #…

0354959

…18

Remove example to silence note

31d10bd

annakrystalli marked this pull request as ready for review April 30, 2024 16:13

annakrystalli added this to the sample output_type v1.0 milestone Apr 30, 2024

fix lint indent issue

80a8873

annakrystalli changed the title ~~[WIP] Support v3.0.0 schema sample specification~~ Support v3.0.0 schema sample specification Apr 30, 2024

annakrystalli added 4 commits May 1, 2024 11:33

Bump version to v1.0.0

ed3a2bd

add note about change in create_output_type_sample output.

60e7062

Correct typo in roxygen docs

8b5e950

Rename extract_version_n to extract_version

7f082fd