Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Qs for making get_data (data curl script) customizable #22

Open
OliviaLynn opened this issue Jul 2, 2023 · 1 comment
Open
Labels
question Further information is requested

Comments

@OliviaLynn
Copy link
Member

The cli script get_data was added in PR #20.

It currently gets one file from NERSC, but we would like to be able to (1) get more files and (2) allow the user to specify a subset of files

Design questions:

  1. Will we be grabbing data from places other than NERSC (and if the same file is available 2 place, do we allow the user to specify which to target)
  2. Will we allow users to specify subsets of data on a file-by-file basis, or will we group them into logical subsets on their own (ie, "download all the files needed to run <something>")
  3. Will we run through available data with a y/n prompt (using the prompting feature in click, we could walk them through each available file for convenience), or will we ask the user to type/copy-paste each file name in
@OliviaLynn OliviaLynn added the question Further information is requested label Jul 2, 2023
@OliviaLynn OliviaLynn mentioned this issue Jul 2, 2023
8 tasks
@aimalz
Copy link
Collaborator

aimalz commented Jul 13, 2023

  1. I think the tests and demos will not be getting data from other places.
  2. From the user perspective, I'd ideally want files to be downloaded at the beginning of a pipeline (in a script or notebook), like up at the imports, and to only download what that script/notebook needs (preferably skipping any files I already have).
  3. In the use case of running it at the beginning of a pipeline notebook/script, the user won't know which files they need, so I think pipeline scripts/notebooks we provide will need to include a list of the necessary file names to grab in the call to get_data, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants