Fix validation errors #1022

Bisaloo · 2021-10-06T14:33:13Z

Discovered with:

test <- ForecastHubValidations::validate_repository("data-processed")
ForecastHubValidations::check_for_errors(test)

metadata-CovidMetrics-epiBATS.txt: Metadata file has to be consistent with schema specifications
- /methods must NOT have more than 200 characters
metadata-FIAS_FZJ-Epi1Ger.txt: Metadata file has to be consistent with schema specifications
- /team_funding must be string
metadata-ITWW-county_repro.txt: Metadata file has to be consistent with schema specifications
- /citation must be string
metadata-KITmetricslab-bivariate_branching.txt: Metadata file has to be consistent with schema specifications (Re-naming KITmetricslab-bivariate_branching, shortening model description. #1031)
- /model_abbr must NOT have more than 32 characters
- /methods must NOT have more than 200 characters
2021-02-28-Karlen-pypm.csv: Forecast data has to be formed of the expected columns with correct type
- /value/169 must be >= 0
- /value/170 must be >= 0
2021-03-07-Karlen-pypm.csv: Forecast data has to be formed of the expected columns with correct type
- /value/169 must be >= 0
2021-04-04-Karlen-pypm.csv: Forecast data has to be formed of the expected columns with correct type
- /value/169 must be >= 0
metadata-MIT_CovidAnalytics-DELPHI.txt: Metadata file has to be consistent with schema specifications
- /team_funding must be string
metadata-USyd-OneModelMan.txt: Metadata file has to be consistent with schema specifications
- /team_funding must be string
metadata-bisop-seirfilterlite.txt: Metadata file has to be consistent with schema specifications
- /website_url must match format "uri"
2021-02-15-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/7 must be >= 0
- /value/102 must be >= 0
- /value/103 must be >= 0
- /value/106 must be >= 0
- /value/107 must be >= 0
- /value/110 must be >= 0
- /value/111 must be >= 0
- /value/115 must be >= 0
- /value/119 must be >= 0
- /value/123 must be >= 0
2021-02-22-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/6 must be >= 0
- /value/7 must be >= 0
- /value/11 must be >= 0
- /value/15 must be >= 0
- /value/101 must be >= 0
- /value/102 must be >= 0
- /value/103 must be >= 0
- /value/105 must be >= 0
- /value/106 must be >= 0
- /value/107 must be >= 0
- /value/109 must be >= 0
- /value/110 must be >= 0
- /value/111 must be >= 0
- /value/114 must be >= 0
- /value/115 must be >= 0
- /value/118 must be >= 0
- /value/119 must be >= 0
- /value/122 must be >= 0
- /value/123 must be >= 0
- /value/127 must be >= 0
- /value/131 must be >= 0
2021-03-01-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/7 must be >= 0
- /value/11 must be >= 0
- /value/101 must be >= 0
- /value/102 must be >= 0
- /value/103 must be >= 0
- /value/106 must be >= 0
- /value/107 must be >= 0
- /value/110 must be >= 0
- /value/111 must be >= 0
- /value/114 must be >= 0
- /value/115 must be >= 0
- /value/119 must be >= 0
- /value/123 must be >= 0
2021-03-08-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/7 must be >= 0
- /value/11 must be >= 0
- /value/102 must be >= 0
- /value/103 must be >= 0
- /value/106 must be >= 0
- /value/107 must be >= 0
- /value/110 must be >= 0
- /value/111 must be >= 0
- /value/115 must be >= 0
- /value/119 must be >= 0
2021-03-15-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/102 must be >= 0
- /value/103 must be >= 0
- /value/106 must be >= 0
- /value/107 must be >= 0
- /value/111 must be >= 0
- /value/115 must be >= 0
2021-03-22-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/103 must be >= 0
2021-04-12-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/102 must be >= 0
- /value/103 must be >= 0
- /value/106 must be >= 0
- /value/107 must be >= 0
- /value/111 must be >= 0
- /value/115 must be >= 0
2021-05-10-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/7 must be >= 0
- /value/11 must be >= 0
- /value/103 must be >= 0
- /value/107 must be >= 0
- /value/111 must be >= 0
2021-05-17-itwm-dSEIR.csv: Forecast data has to be formed of the expected columns with correct type
- /value/7 must be >= 0
- /value/11 must be >= 0
- /value/15 must be >= 0

Bisaloo · 2021-10-06T15:12:10Z

@kathsherratt, @sbfnk, what did we decide should happen for negative values? Set them to 0, right?

Bisaloo · 2021-10-06T16:38:24Z

I see #104.

I understand the reasoning but if we're trying to build a general solution, then it's difficult / kind of awkward to bake exceptions into it.

If we really don't want to modify these forecasts, then we could have a list of exceptions that automatically pass the validation.

kathsherratt · 2021-10-06T20:46:32Z

Ah yep, we didn't have a check for <0 forecast values in the first week. So when we added that to the validation checks, we made it conditional on checking submissions after the first week.

I agree that is a bit awkward to put into a generalised validation framework. As a work around, could we implement this check as a "note" rather than "fail" result? So doesn't stop the check but results in a flagged note (instead of a tick or cross). (I guess that's a bit like the "notes" produced when running CRAN package checks)

Alternatively (or also), we could also just replace any <0 value in the repo to 0 and check it's okay with the teams affected - like we did in the update to round values. I think that's my preference! What do you think @sbfnk?

Bisaloo · 2021-10-07T10:58:44Z

Ah yep, we didn't have a check for <0 forecast values in the first week. So when we added that to the validation checks, we made it conditional on checking submissions after the first week.

But we have negative values as late as mid-May, which is much later than the first week (and later than the 2021-03-22 date mentioned in #104).

As a work around, could we implement this check as a "note" rather than "fail" result? So doesn't stop the check but results in a flagged note (instead of a tick or cross). (I guess that's a bit like the "notes" produced when running CRAN package checks)

I'm against this as this kind of "soft" checks are equivalent to no check at all IMO. I guess this is what happened for the negative values in 2021-05-17-itwm-dSEIR.csv. There was a warning but it flew under the radar since it was just a warning, and not a hard failure. This is similar to Hadley's position on warning()s in R:

Warnings occupy a somewhat challenging place between messages (“you should know about this”) and errors (“you must fix this!”), and it’s hard to give precise advice on when to use them. Generally, be restrained, as warnings are easy to miss if there’s a lot of other output, and you don’t want your function to recover too easily from clearly invalid input. In my opinion, base R tends to overuse warnings, and many warnings in base R would be better off as errors.

Source

Bisaloo · 2021-10-07T10:59:02Z

On an unrelated note, should we enforce or remove the character limits in methods (authors can always expand in methods_long) and model_abbr?

kathsherratt · 2021-10-07T12:30:46Z

But we have negative values as late as mid-May, which is much later than the first week (and later than the 2021-03-22 date mentioned in #104).

Yes, as you guessed, I think because it was implemented as a warning comment on the PR, not a check that would fail the overall validation. So I guess some slipped through...

It looks like only 2 teams had negative forecasts, Karlen-pypm and itwm-dSEIR. Karlen-pypm has only 4 predictions <0. The remainder are itwm-dSEIR, with 75 predictions for Germany <0, and unfortunately some of these negative values are quite large (up to -250825) :( My vote is for for asking that team if they'd be happy for us to replace with 0.

sbfnk · 2021-10-07T12:34:45Z

It looks like only 2 teams had negative forecasts, Karlen-pypm and itwm-dSEIR. Karlen-pypm has only 4 predictions <0. The remainder are itwm-dSEIR, with 75 predictions for Germany <0, and unfortunately some of these negative values are quite large (up to -250825) :( My vote is for for asking that team if they'd be happy for us to replace with 0.

That sounds good to me.

sbfnk · 2021-10-07T12:35:39Z

On an unrelated note, should we enforce or remove the character limits in methods (authors can always expand in methods_long) and model_abbr?

Probably a good idea to enforce - nice to have a "short" version, e.g. for the web site.

Bisaloo · 2021-10-07T16:09:01Z

Hi @jbracher 👋, could you have a look at the metadata of KITmetricslab-bivariate_branching please?

the current model_abbr is too long with 33 characters while we have a maximum of 32 characters
the current methods is too long with 307 characters while we have a maximum of 200 characters. You are free to expand as much as you'd like in the methods_long section.

Thanks a lot 🙏!

Bisaloo · 2021-11-22T21:49:38Z

This has been fixed.

erwinlagu · 2022-01-09T16:11:29Z

Nice job

Bisaloo mentioned this issue Oct 6, 2021

Fix validation issues #1024

Merged

Bisaloo closed this as completed Nov 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix validation errors #1022

Fix validation errors #1022

Bisaloo commented Oct 6, 2021 •

edited

Loading

Bisaloo commented Oct 6, 2021

Bisaloo commented Oct 6, 2021

kathsherratt commented Oct 6, 2021 •

edited

Loading

Bisaloo commented Oct 7, 2021

Bisaloo commented Oct 7, 2021

kathsherratt commented Oct 7, 2021

sbfnk commented Oct 7, 2021

sbfnk commented Oct 7, 2021

Bisaloo commented Oct 7, 2021

Bisaloo commented Nov 22, 2021 •

edited

Loading

erwinlagu commented Jan 9, 2022

Fix validation errors #1022

Fix validation errors #1022

Comments

Bisaloo commented Oct 6, 2021 • edited Loading

Bisaloo commented Oct 6, 2021

Bisaloo commented Oct 6, 2021

kathsherratt commented Oct 6, 2021 • edited Loading

Bisaloo commented Oct 7, 2021

Bisaloo commented Oct 7, 2021

kathsherratt commented Oct 7, 2021

sbfnk commented Oct 7, 2021

sbfnk commented Oct 7, 2021

Bisaloo commented Oct 7, 2021

Bisaloo commented Nov 22, 2021 • edited Loading

erwinlagu commented Jan 9, 2022

Bisaloo commented Oct 6, 2021 •

edited

Loading

kathsherratt commented Oct 6, 2021 •

edited

Loading

Bisaloo commented Nov 22, 2021 •

edited

Loading