Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace team_abbr and model_abbr output columns with model_id #8

Merged
merged 2 commits into from
May 10, 2024

Conversation

bsweger
Copy link
Collaborator

@bsweger bsweger commented May 7, 2024

Resolves #2

Given @lmullany's findings when converting the archived FluSight repo into hubverse format, we made a decision to simplify how we're parsing model-output file names.

TL;DR: instead of creating separate model and team abbr columns in the transformed parquet file, we're going to create a single model_id column (see the linked issue for more details).

This PR also sneaks in a minor change to logging strings based on feedback from @matthewcornell

bsweger added 2 commits May 7, 2024 14:38
Matt rightly pointed out that strings are more readable this
way (and it turns out we're using f-strings in the logger
throughtout the rest of the code base anyway!)
model_id_split = re.split(rf"{round_id}[-_]*", file_name)
if not model_id_split or len(model_id_split) <= 1 or not model_id_split[-1]:
raise ValueError(f"Unable to get model_id from file name {file_name}.")
model_id = "".join(model_id_split[-1].split())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this removes any spaces in whatever is remaining. Is that your intent? So 2022-12-31-my team-my model.parquet would return "myteam-mymodel" for model_id

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call-out. It was my intent at the time I made the pull request, but I have another in the queue (started after Wednesday's dev meeting) that retains the spaces and adds some corresponding test cases.

Copy link
Collaborator

@lmullany lmullany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look good to me.

@bsweger bsweger merged commit 70e7257 into main May 10, 2024
1 check passed
@bsweger bsweger deleted the bsweger/merge-team-and-model-cols branch May 30, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update round, team, and model parsing in model-output transform function
2 participants