Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support v3.0.0 schema sample specification #19

Merged
merged 39 commits into from
Jun 19, 2024

Conversation

annakrystalli
Copy link
Member

@annakrystalli annakrystalli commented Apr 29, 2024

This PR will:

For further details and context see here

@annakrystalli annakrystalli marked this pull request as draft April 29, 2024 13:57
Copy link

github-actions bot commented Apr 29, 2024

@annakrystalli annakrystalli marked this pull request as ready for review April 30, 2024 16:13
@annakrystalli annakrystalli added this to the sample output_type v1.0 milestone Apr 30, 2024
@annakrystalli annakrystalli changed the title [WIP] Support v3.0.0 schema sample specification Support v3.0.0 schema sample specification Apr 30, 2024
R/utils.R Outdated Show resolved Hide resolved
R/create_output_type_item.R Outdated Show resolved Hide resolved
R/create_output_type_item.R Outdated Show resolved Hide resolved
R/create_output_type_item.R Outdated Show resolved Hide resolved
R/create_output_type_item.R Outdated Show resolved Hide resolved
Comment on lines 354 to 360
# Validate that compound_taskid_set values are valid task ids for a
# given modeling task group in a given round.
# Returns NULL if not applicable or check passes and error df row if check fails.
validate_mt_sample_comp_tids <- function(model_task_grp,
model_task_i,
round_i,
schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions about what's going on here and in check_compound_taskids_valid:

  • The idea is that check_compound_taskids_valid is used when creating a new config file, but this function is used when validating an existing config file?
  • There may be value in standardizing on naming conventions:
    • it's not clear to me why we've used validate here and check in the name of the other function. The check function is actually aborting, while this one generates an error message in a data frame that is collected for later use? Are there standard conventions for this kind of thing? I went and looked at checkmate docs, but they don't have things starting with validate.
    • noting comp_tids here vs. compound_taskids in the other function name
  • would there be any value or possibility to refactor the check logic here and in that other function? It's not much code and it's not exactly identical, so maybe not. But it is quite similar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's two places that dynamic validations that can't be encoded in our schema need to be performed:

  1. when validating a tasks.json file (validate_mt_sample_comp_tids()). I used validate here because it is part of the config validation process which itself is based on the concept of using json schema to validate json data, powered by the package jsonvalidate. The output of the validate function conform to the output of the jsonvalidate::json_validate(), specifically the errors attribute tbl which can then be passed to view_config_val_errors().
  2. when checking inputs to functions that create config programmatically. Because we are checking inputs to a function effectively, here I opted for check. e.g. a la rlang:: check_rquired() etc which indeed is designed to throw an error if the condition is not met.

I believe different languages have related but slightly different conventions (e.g. python uses assert in testing while we use expect) and I don't know of any resource is R that dictate the use of specific verbs for specific situations (I believe the checkmate convention is just their own internal convention). As such I've just tried to keep these internally consistent to separate functions that are used for validation of json vs functions used to check inputs to functions generating config. I've also kept such functions separate because although they might be checking the same thing, the inputs and outputs/side effects of each function are so different and take up so much of the logic in each function that trying to shoehorn them into a single function didn't make sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! all sounds good. the one remaining question then is about comp_tids vs. compound_taskids

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 33a4859

Copy link
Contributor

@elray1 elray1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good, I have made a couple of minor comments.

Additionally, I noticed that on lines 146-150 of R/create_output_type_item.R, we have the following docstring with some duplication (github didn't let me comment on this or suggest changes as part of this review because this PR didn't introduce this, but I thought maybe we could just clean it up here):

#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.
#' output type.
#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.

Copy link

codecov bot commented May 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.92%. Comparing base (1194ab2) to head (3399d17).
Report is 4 commits behind head on main.

❗ Current head 3399d17 differs from pull request most recent head 9a7afa3. Consider uploading reports for the commit 9a7afa3 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #19      +/-   ##
==========================================
+ Coverage   87.23%   87.92%   +0.69%     
==========================================
  Files          20       21       +1     
  Lines        1637     1830     +193     
==========================================
+ Hits         1428     1609     +181     
- Misses        209      221      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@annakrystalli
Copy link
Member Author

Thanks! This looks good, I have made a couple of minor comments.

Additionally, I noticed that on lines 146-150 of R/create_output_type_item.R, we have the following docstring with some duplication (github didn't let me comment on this or suggest changes as part of this review because this PR didn't introduce this, but I thought maybe we could just clean it up here):

#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.
#' output type.
#' This can be combined with other building blocks which can then be written as
#' or appended to `tasks.json` Hub config files.

Thanks! Fixed this in f026d6c

@annakrystalli annakrystalli self-assigned this May 29, 2024
@annakrystalli annakrystalli merged commit 719cd09 into main Jun 19, 2024
8 checks passed
@annakrystalli annakrystalli deleted the feature/sample-support branch June 19, 2024 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
2 participants