Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider updating documentation about model output folder to use model_id1, model_id2 and double check file names #114

Closed
elray1 opened this issue Apr 25, 2024 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@elray1
Copy link
Contributor

elray1 commented Apr 25, 2024

Looking at this page: https://hubverse.io/en/latest/user-guide/model-output.html

Currently the folder and file structure are listed as follows:

  • team1-modela
    • <round-id1>.csv (or parquet, etc)
    • <round-id2>.csv (or parquet, etc)
  • team1-modelb
    • <round-id1>.csv (or parquet, etc)
  • team2-modela
    • <round-id1>.csv (or parquet, etc)

Two comments about this:

  • This implies a specific structure for model ids as <team_abbr>-<model_abbr>, but it may be clearer to just indicate here that the folder names correspond to model_ids, and we can discuss conventions about composition of model_id elsewhere.
  • Do file names have the format <round_id>.csv, or <round_id>-<model_id>.csv? I think we've decided to include model_id as a check that submissions landed in the right folder, but I'm not sure.
@elray1 elray1 changed the title consider updating documentation about model output folder to use model_id1, model_id2 consider updating documentation about model output folder to use model_id1, model_id2 and double check file names Apr 25, 2024
@nickreich
Copy link
Contributor

  1. I think that we have elsewhere indicated that <model_id> == <team_abbr>-<model_abbr> and that teams can choose one representation to use, as indicated in their model metadata schema file.

  2. I don't recall the specifics of that decision, but I support <round_id>-<model_id>.csv or .parquet as a file format.

@bsweger
Copy link
Contributor

bsweger commented Apr 25, 2024

Thanks for raising this!

My .02 on the first question, mostly from the perspective of how we'll move hub data to the cloud and open it up to a non-hubverse audience.

This implies a specific structure for model ids as <team_abbr>-<model_abbr>, but it may be clearer to just indicate here that the folder names correspond to model_ids, and we can discuss conventions about composition of model_id elsewhere.

Removing the separate model-abbr and team-abbr columns from the "cloud transformed" model-output files in favor of a single model_id column simplifies the data conversion process. It does put the onus of parsing out team/model on data consumers, but I think it makes sense to favor the simple approach and revisit if we get feedback.

@bsweger
Copy link
Contributor

bsweger commented Apr 25, 2024

I don't recall the specifics of that decision, but I support <round_id>-<model_id>.csv or .parquet as a file format.

Agree with @nickreich's comment re: item 2 (especially if we agree to make YYYY-MM-DD the required format for round_id, since that creates a definitive way to parse out round and model from a model-output filename).

Again, this is from the perspective of a cloud-enabled hub. While model_id could be obtained via "directory" structure or from a column in the actual file, I can see how it would be handy to have that information encoded in the filename, especially if people lose the directory structure context when downloading data.

@bsweger
Copy link
Contributor

bsweger commented May 3, 2024

It's been a week since anyone has chimed in, so I'm going to assume that we'll proceed with @nickreich and @elray1's suggestions above:

  1. model-output filenames will be in format <round_id>-<model_id> format
  2. instead of trying to parse out model name and team name separate, the function that transforms cloud-based model-output files will instead generate a single column called model_idthat contains anything after round_id in the filename

#2 reflects hubverse-transform work to address the latter.

@mzorn-58 mzorn-58 added the documentation Improvements or additions to documentation label Jun 5, 2024
@mzorn-58 mzorn-58 self-assigned this Jun 5, 2024
@mzorn-58
Copy link
Contributor

mzorn-58 commented Jun 5, 2024

This page: now shows structure as

image

OK to close issue? @elray1 @nickreich

@mzorn-58 mzorn-58 moved this from Todo to Ready for Review in hubverse Development overview Jun 5, 2024
@mmkerr
Copy link
Contributor

mmkerr commented Jun 14, 2024

this is similar to an issue Anna raised in closed issue #116

@nickreich
Copy link
Contributor

Agree that this can be closed.

@github-project-automation github-project-automation bot moved this from Ready for Review to Done in hubverse Development overview Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Development

No branches or pull requests

5 participants