Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix validation errors #1022

Closed
7 of 19 tasks
Bisaloo opened this issue Oct 6, 2021 · 11 comments
Closed
7 of 19 tasks

Fix validation errors #1022

Bisaloo opened this issue Oct 6, 2021 · 11 comments

Comments

@Bisaloo
Copy link
Member

Bisaloo commented Oct 6, 2021

Discovered with:

test <- ForecastHubValidations::validate_repository("data-processed")
ForecastHubValidations::check_for_errors(test)
  • metadata-CovidMetrics-epiBATS.txt: Metadata file has to be consistent with schema specifications
    • /methods must NOT have more than 200 characters
  • metadata-FIAS_FZJ-Epi1Ger.txt: Metadata file has to be consistent with schema specifications
    • /team_funding must be string
  • metadata-ITWW-county_repro.txt: Metadata file has to be consistent with schema specifications
    • /citation must be string
  • metadata-KITmetricslab-bivariate_branching.txt: Metadata file has to be consistent with schema specifications (Re-naming KITmetricslab-bivariate_branching, shortening model description. #1031)
    • /model_abbr must NOT have more than 32 characters
    • /methods must NOT have more than 200 characters
  • 2021-02-28-Karlen-pypm.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/169 must be >= 0
    • /value/170 must be >= 0
  • 2021-03-07-Karlen-pypm.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/169 must be >= 0
  • 2021-04-04-Karlen-pypm.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/169 must be >= 0
  • metadata-MIT_CovidAnalytics-DELPHI.txt: Metadata file has to be consistent with schema specifications
    • /team_funding must be string
  • metadata-USyd-OneModelMan.txt: Metadata file has to be consistent with schema specifications
    • /team_funding must be string
  • metadata-bisop-seirfilterlite.txt: Metadata file has to be consistent with schema specifications
    • /website_url must match format "uri"
  • 2021-02-15-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/7 must be >= 0
    • /value/102 must be >= 0
    • /value/103 must be >= 0
    • /value/106 must be >= 0
    • /value/107 must be >= 0
    • /value/110 must be >= 0
    • /value/111 must be >= 0
    • /value/115 must be >= 0
    • /value/119 must be >= 0
    • /value/123 must be >= 0
  • 2021-02-22-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/6 must be >= 0
    • /value/7 must be >= 0
    • /value/11 must be >= 0
    • /value/15 must be >= 0
    • /value/101 must be >= 0
    • /value/102 must be >= 0
    • /value/103 must be >= 0
    • /value/105 must be >= 0
    • /value/106 must be >= 0
    • /value/107 must be >= 0
    • /value/109 must be >= 0
    • /value/110 must be >= 0
    • /value/111 must be >= 0
    • /value/114 must be >= 0
    • /value/115 must be >= 0
    • /value/118 must be >= 0
    • /value/119 must be >= 0
    • /value/122 must be >= 0
    • /value/123 must be >= 0
    • /value/127 must be >= 0
    • /value/131 must be >= 0
  • 2021-03-01-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/7 must be >= 0
    • /value/11 must be >= 0
    • /value/101 must be >= 0
    • /value/102 must be >= 0
    • /value/103 must be >= 0
    • /value/106 must be >= 0
    • /value/107 must be >= 0
    • /value/110 must be >= 0
    • /value/111 must be >= 0
    • /value/114 must be >= 0
    • /value/115 must be >= 0
    • /value/119 must be >= 0
    • /value/123 must be >= 0
  • 2021-03-08-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/7 must be >= 0
    • /value/11 must be >= 0
    • /value/102 must be >= 0
    • /value/103 must be >= 0
    • /value/106 must be >= 0
    • /value/107 must be >= 0
    • /value/110 must be >= 0
    • /value/111 must be >= 0
    • /value/115 must be >= 0
    • /value/119 must be >= 0
  • 2021-03-15-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/102 must be >= 0
    • /value/103 must be >= 0
    • /value/106 must be >= 0
    • /value/107 must be >= 0
    • /value/111 must be >= 0
    • /value/115 must be >= 0
  • 2021-03-22-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/103 must be >= 0
  • 2021-04-12-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/102 must be >= 0
    • /value/103 must be >= 0
    • /value/106 must be >= 0
    • /value/107 must be >= 0
    • /value/111 must be >= 0
    • /value/115 must be >= 0
  • 2021-05-10-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/7 must be >= 0
    • /value/11 must be >= 0
    • /value/103 must be >= 0
    • /value/107 must be >= 0
    • /value/111 must be >= 0
  • 2021-05-17-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
    • /value/7 must be >= 0
    • /value/11 must be >= 0
    • /value/15 must be >= 0
@Bisaloo
Copy link
Member Author

Bisaloo commented Oct 6, 2021

@kathsherratt, @sbfnk, what did we decide should happen for negative values? Set them to 0, right?

@Bisaloo
Copy link
Member Author

Bisaloo commented Oct 6, 2021

I see #104.

I understand the reasoning but if we're trying to build a general solution, then it's difficult / kind of awkward to bake exceptions into it.

If we really don't want to modify these forecasts, then we could have a list of exceptions that automatically pass the validation.

@kathsherratt
Copy link
Collaborator

kathsherratt commented Oct 6, 2021

Ah yep, we didn't have a check for <0 forecast values in the first week. So when we added that to the validation checks, we made it conditional on checking submissions after the first week.

I agree that is a bit awkward to put into a generalised validation framework. As a work around, could we implement this check as a "note" rather than "fail" result? So doesn't stop the check but results in a flagged note (instead of a tick or cross). (I guess that's a bit like the "notes" produced when running CRAN package checks)

Alternatively (or also), we could also just replace any <0 value in the repo to 0 and check it's okay with the teams affected - like we did in the update to round values. I think that's my preference! What do you think @sbfnk?

@Bisaloo
Copy link
Member Author

Bisaloo commented Oct 7, 2021

Ah yep, we didn't have a check for <0 forecast values in the first week. So when we added that to the validation checks, we made it conditional on checking submissions after the first week.

But we have negative values as late as mid-May, which is much later than the first week (and later than the 2021-03-22 date mentioned in #104).

As a work around, could we implement this check as a "note" rather than "fail" result? So doesn't stop the check but results in a flagged note (instead of a tick or cross). (I guess that's a bit like the "notes" produced when running CRAN package checks)

I'm against this as this kind of "soft" checks are equivalent to no check at all IMO. I guess this is what happened for the negative values in 2021-05-17-itwm-dSEIR.csv. There was a warning but it flew under the radar since it was just a warning, and not a hard failure. This is similar to Hadley's position on warning()s in R:

Warnings occupy a somewhat challenging place between messages (“you should know about this”) and errors (“you must fix this!”), and it’s hard to give precise advice on when to use them. Generally, be restrained, as warnings are easy to miss if there’s a lot of other output, and you don’t want your function to recover too easily from clearly invalid input. In my opinion, base R tends to overuse warnings, and many warnings in base R would be better off as errors.

Source

@Bisaloo
Copy link
Member Author

Bisaloo commented Oct 7, 2021

On an unrelated note, should we enforce or remove the character limits in methods (authors can always expand in methods_long) and model_abbr?

@kathsherratt
Copy link
Collaborator

But we have negative values as late as mid-May, which is much later than the first week (and later than the 2021-03-22 date mentioned in #104).

Yes, as you guessed, I think because it was implemented as a warning comment on the PR, not a check that would fail the overall validation. So I guess some slipped through...

It looks like only 2 teams had negative forecasts, Karlen-pypm and itwm-dSEIR. Karlen-pypm has only 4 predictions <0. The remainder are itwm-dSEIR, with 75 predictions for Germany <0, and unfortunately some of these negative values are quite large (up to -250825) :( My vote is for for asking that team if they'd be happy for us to replace with 0.

@sbfnk
Copy link
Contributor

sbfnk commented Oct 7, 2021

It looks like only 2 teams had negative forecasts, Karlen-pypm and itwm-dSEIR. Karlen-pypm has only 4 predictions <0. The remainder are itwm-dSEIR, with 75 predictions for Germany <0, and unfortunately some of these negative values are quite large (up to -250825) :( My vote is for for asking that team if they'd be happy for us to replace with 0.

That sounds good to me.

@sbfnk
Copy link
Contributor

sbfnk commented Oct 7, 2021

On an unrelated note, should we enforce or remove the character limits in methods (authors can always expand in methods_long) and model_abbr?

Probably a good idea to enforce - nice to have a "short" version, e.g. for the web site.

@Bisaloo
Copy link
Member Author

Bisaloo commented Oct 7, 2021

Hi @jbracher 👋, could you have a look at the metadata of KITmetricslab-bivariate_branching please?

  • the current model_abbr is too long with 33 characters while we have a maximum of 32 characters
  • the current methods is too long with 307 characters while we have a maximum of 200 characters. You are free to expand as much as you'd like in the methods_long section.

Thanks a lot 🙏!

@Bisaloo
Copy link
Member Author

Bisaloo commented Nov 22, 2021

This has been fixed.

@Bisaloo Bisaloo closed this as completed Nov 22, 2021
@erwinlagu
Copy link

Nice job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants