Replace team_abbr and model_abbr output columns with model_id #8

bsweger · 2024-05-07T18:38:31Z

Resolves #2

Given @lmullany's findings when converting the archived FluSight repo into hubverse format, we made a decision to simplify how we're parsing model-output file names.

TL;DR: instead of creating separate model and team abbr columns in the transformed parquet file, we're going to create a single model_id column (see the linked issue for more details).

This PR also sneaks in a minor change to logging strings based on feedback from @matthewcornell

Matt rightly pointed out that strings are more readable this way (and it turns out we're using f-strings in the logger throughtout the rest of the code base anyway!)

lmullany · 2024-05-09T14:59:30Z

src/hubverse_transform/model_output.py

+        model_id_split = re.split(rf"{round_id}[-_]*", file_name)
+        if not model_id_split or len(model_id_split) <= 1 or not model_id_split[-1]:
+            raise ValueError(f"Unable to get model_id from file name {file_name}.")
+        model_id = "".join(model_id_split[-1].split())


this removes any spaces in whatever is remaining. Is that your intent? So 2022-12-31-my team-my model.parquet would return "myteam-mymodel" for model_id

Good call-out. It was my intent at the time I made the pull request, but I have another in the queue (started after Wednesday's dev meeting) that retains the spaces and adds some corresponding test cases.

lmullany

look good to me.

bsweger added 2 commits May 7, 2024 14:38

Replace team_abbr and model_abbr output columns with model_id

ce8ddae

Use f-strings in lambda_handler logger

0bee296

Matt rightly pointed out that strings are more readable this way (and it turns out we're using f-strings in the logger throughtout the rest of the code base anyway!)

lmullany reviewed May 9, 2024

View reviewed changes

lmullany approved these changes May 9, 2024

View reviewed changes

bsweger merged commit 70e7257 into main May 10, 2024
1 check passed

bsweger deleted the bsweger/merge-team-and-model-cols branch May 30, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace team_abbr and model_abbr output columns with model_id #8

Replace team_abbr and model_abbr output columns with model_id #8

bsweger commented May 7, 2024 •

edited

Loading

lmullany May 9, 2024

bsweger May 10, 2024

lmullany left a comment

Replace team_abbr and model_abbr output columns with model_id #8

Replace team_abbr and model_abbr output columns with model_id #8

Conversation

bsweger commented May 7, 2024 • edited Loading

lmullany May 9, 2024

Choose a reason for hiding this comment

bsweger May 10, 2024

Choose a reason for hiding this comment

lmullany left a comment

Choose a reason for hiding this comment

bsweger commented May 7, 2024 •

edited

Loading