Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle round_ids other than dates? #81

Closed
annakrystalli opened this issue Apr 24, 2024 · 7 comments
Closed

Handle round_ids other than dates? #81

annakrystalli opened this issue Apr 24, 2024 · 7 comments

Comments

@annakrystalli
Copy link
Member

annakrystalli commented Apr 24, 2024

Background

Since the beginning of the project we have discussed and in general planned for supporting round_ids other than dates. So far this has not been necessary and our validations have focused on the assumption that round_ids will be dates, largely because all known hubs do indeed use dates as round IDs. Some discussing around this topic during development of the validation framework can be found here: #13

However, the push to convert historical hubs to hubverse style hubs has resurfaced the question of supporting non-date round ids and the need to assess implementation implications and weigh them against the benefits of supporting this feature.

Implications of using non-dates as round ids

Use of a non-date round id has some important implications, most importantly on how submission windows are configured:

  • When the round id is a date and is configured as a task ID contained in the file, multiple rounds can be concisely configured in a single round configuration by setting multiple options for a valid round id through the relevant task id as well as setting the submission window to be calculated as relative to the value of the round id.
  • If a character string is used instead as a round ID, the specification of a window relative to a date contained within the file (or filename) is no longer possible. This means that each round would need to be configured individually, with an explicitly set submission window start and end date for each round id. For example, if the following simplified flusight tasks.json config is changed to use epiweek (in the format YYYY-EPIWEEK) instead of ISO date, for just specifying 3 rounds, the size of the config triples containing primarily repeated information (only the submission window specification changes:

    date round id

        {
        "schema_version": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.0/tasks-schema.json",
        "rounds": [{
                "round_id_from_variable": true,
                "round_id": "origin_date",
                "model_tasks": [{
                    "task_ids": {
                        "origin_date": {
                            "required": null,
                            "optional": [
                                "2022-12-12", "2022-12-19", "2022-12-26"
                            ]
                        },
                        "target": {
                            "required": null,
                            "optional": ["wk ahead inc flu hosp"]
                        },
                        "horizon": {
                            "required": [2],
                            "optional": [1]
                        },
                        "location": {
                            "required": ["US"],
                            "optional": [
                                "01",
                                "02"
                            ]
                        }
                    },
                    "output_type": {
                        "mean": {
                            "output_type_id": {
                                "required": null,
                                "optional": ["NA"]
                            },
                            "value": {
                                "type": "double",
                                "minimum": 0
                            }
                        }
                    },
                    "target_metadata": [{
                        "target_id": "wk ahead inc flu hosp",
                        "target_name": "weekly influenza hospitalization incidence",
                        "target_units": "rate per 100,000 population",
                        "target_keys": {
                            "target": ["wk ahead inc flu hosp"]
                        },
                        "target_type": "discrete",
                        "description": "This target represents the counts of new hospitalizations per horizon week.",
                        "is_step_ahead": true,
                        "time_unit": "week"
                    }]
                }],
                "submissions_due": {
                    "relative_to": "origin_date",
                    "start": -6,
                    "end": 2
                }
            }
    
        ]
    }

    epiweek code round id

    {
            "schema_version": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.0/tasks-schema.json",
            "rounds": [{
                    "round_id_from_variable": true,
                    "round_id": "origin_epiweek",
                    "model_tasks": [{
                        "task_ids": {
                            "origin_epiweek": {
                                "required": ["2022-50"],
                                "optional": null
                            },
                            "target": {
                                "required": null,
                                "optional": ["wk ahead inc flu hosp"]
                            },
                            "horizon": {
                                "required": [2],
                                "optional": [1]
                            },
                            "location": {
                                "required": ["US"],
                                "optional": [
                                    "01",
                                    "02"
                                ]
                            }
                        },
                        "output_type": {
                            "mean": {
                                "output_type_id": {
                                    "required": null,
                                    "optional": ["NA"]
                                },
                                "value": {
                                    "type": "double",
                                    "minimum": 0
                                }
                            }
                        },
                        "target_metadata": [{
                            "target_id": "wk ahead inc flu hosp",
                            "target_name": "weekly influenza hospitalization incidence",
                            "target_units": "rate per 100,000 population",
                            "target_keys": {
                                "target": ["wk ahead inc flu hosp"]
                            },
                            "target_type": "discrete",
                            "description": "This target represents the counts of new hospitalizations per horizon week.",
                            "is_step_ahead": true,
                            "time_unit": "week"
                        }]
                    }],
                    "submissions_due": {
                        "start": "2022-12-06",
                        "end": "2022-12-14"
                    }
                },
                {
                    "round_id_from_variable": true,
                    "round_id": "origin_epiweek",
                    "model_tasks": [{
                        "task_ids": {
                            "origin_epiweek": {
                                "required": ["2022-51"],
                                "optional": null
                            },
                            "target": {
                                "required": null,
                                "optional": ["wk ahead inc flu hosp"]
                            },
                            "horizon": {
                                "required": [2],
                                "optional": [1]
                            },
                            "location": {
                                "required": ["US"],
                                "optional": [
                                    "01",
                                    "02"
                                ]
                            }
                        },
                        "output_type": {
                            "mean": {
                                "output_type_id": {
                                    "required": null,
                                    "optional": ["NA"]
                                },
                                "value": {
                                    "type": "double",
                                    "minimum": 0
                                }
                            }
                        },
                        "target_metadata": [{
                            "target_id": "wk ahead inc flu hosp",
                            "target_name": "weekly influenza hospitalization incidence",
                            "target_units": "rate per 100,000 population",
                            "target_keys": {
                                "target": ["wk ahead inc flu hosp"]
                            },
                            "target_type": "discrete",
                            "description": "This target represents the counts of new hospitalizations per horizon week.",
                            "is_step_ahead": true,
                            "time_unit": "week"
                        }]
                    }],
                    "submissions_due": {
                        "start": "2022-12-13",
                        "end": "2022-12-21"
                    }
                },
                {
                    "round_id_from_variable": true,
                    "round_id": "origin_epiweek",
                    "model_tasks": [{
                        "task_ids": {
                            "origin_epiweek": {
                                "required": ["2022-52"],
                                "optional": null
                            },
                            "target": {
                                "required": null,
                                "optional": ["wk ahead inc flu hosp"]
                            },
                            "horizon": {
                                "required": [2],
                                "optional": [1]
                            },
                            "location": {
                                "required": ["US"],
                                "optional": [
                                    "01",
                                    "02"
                                ]
                            }
                        },
                        "output_type": {
                            "mean": {
                                "output_type_id": {
                                    "required": null,
                                    "optional": ["NA"]
                                },
                                "value": {
                                    "type": "double",
                                    "minimum": 0
                                }
                            }
                        },
                        "target_metadata": [{
                            "target_id": "wk ahead inc flu hosp",
                            "target_name": "weekly influenza hospitalization incidence",
                            "target_units": "rate per 100,000 population",
                            "target_keys": {
                                "target": ["wk ahead inc flu hosp"]
                            },
                            "target_type": "discrete",
                            "description": "This target represents the counts of new hospitalizations per horizon week.",
                            "is_step_ahead": true,
                            "time_unit": "week"
                        }]
                    }],
                    "submissions_due": {
                        "start": "2022-12-20",
                        "end": "2022-12-28"
                    }
                }
            ]
        }
  • One to get round this would be to suppress standard submission window validation and create a custom function to validate submission windows from epiweek codes. It remains to be be determined whether all this effort is worth it especially given just using dates would make both configuration and validation simpler.

Work Required

If we do choose to go ahead and support non-date round IDs, the main work would be in modifying hubValidations::parse_file_name() to recognise and match non date round IDs.

If we decide we will not support non-date round IDs, we need to update hubDocs to reflect that.

@nickreich
Copy link
Contributor

I would be in favor of not adding support for non-date round-ids for now, and only supporting round-ids that are in the format of dates. Are there clear usecases where supporting non-date round-ids would be useful?

@annakrystalli
Copy link
Member Author

@LucieContamin wrote in https://github.com/orgs/Infectious-Disease-Modeling-Hubs/discussions/7#discussioncomment-9236827

I am not sure I totally understand the issue here, sorry. But, for SMH, we mainly use origin_date as round_id . However, we have some rounds where the round_id is not the origin_date, and is only use in the filename, to be able to tag which file correspond to which round. In this case, the format of round_id does not matter a lot. We still use a YYYY-MM-DD format to follow the same "style" as the other round. Does that answer your question? or help?

Could you share an example of what such round_ids look like, as it does matter what they contain in that we need to be able to consistently parse round_id from model_id in filenames so how we do that can be made easier or harder by whether we follow certain conventions in how we specify round_ids (if they are not dates).

Additionally, in the rounds where round_ids are not the origin date, what value does origin date contain in the files?

Would be super curious to see an example of both the tasks.json and some files (including filenames) of what you describe!

@LucieContamin
Copy link
Contributor

The round_id we are using is still in the ISO Date format: "YYYY-MM-DD", for example:

"round_id": "2024-05-15",
      "round_id_from_variable": false,
      "model_tasks": [
        {
          "task_ids": {
            "origin_date": {
              "required": ["2020-11-15"],
              "optional": null
            }, ....

So, the filename follow the "usual" format, for example: model-output/team2-modelb/2024-05-15-team2-modelb.gz.parquet.

I am happy to provide more information and example, if necessary. I can also give you the link to the repository link to these rounds: https://github.com/midas-network/covid19-smh-research

@annakrystalli
Copy link
Member Author

Thank you @LucieContamin !

OK so it still is a date so still not an example of a non date round_id! 😜

Out of curiosity, what made you configure some rounds one way and some the other?

@LucieContamin
Copy link
Contributor

Ah yes, still a date but as I use it only for tracking files, it could have been anything I guess. It's not use for anything else.

We decided to configure it like this, because we have two rounds with the same origin_date so we needed to use something else for round_id.

@annakrystalli
Copy link
Member Author

Very useful context, thanks. I guess if we were to support non-date round ids, so long as they conformed to using round id that only contain alphanumerics and _, I believe our current systems would work (see deep dive here).

And you still have origin_date in your files so you have dates to match to target data and plot. It's when that date information is not included that issues can arise.

@zkamvar
Copy link
Member

zkamvar commented Oct 25, 2024

fixed in #133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants