Design Qs for making get_data (data curl script) customizable #22

OliviaLynn · 2023-07-02T23:36:51Z

The cli script get_data was added in PR #20.

It currently gets one file from NERSC, but we would like to be able to (1) get more files and (2) allow the user to specify a subset of files

Design questions:

Will we be grabbing data from places other than NERSC (and if the same file is available 2 place, do we allow the user to specify which to target)
Will we allow users to specify subsets of data on a file-by-file basis, or will we group them into logical subsets on their own (ie, "download all the files needed to run <something>")
Will we run through available data with a y/n prompt (using the prompting feature in click, we could walk them through each available file for convenience), or will we ask the user to type/copy-paste each file name in

aimalz · 2023-07-13T22:15:24Z

I think the tests and demos will not be getting data from other places.
From the user perspective, I'd ideally want files to be downloaded at the beginning of a pipeline (in a script or notebook), like up at the imports, and to only download what that script/notebook needs (preferably skipping any files I already have).
In the use case of running it at the beginning of a pipeline notebook/script, the user won't know which files they need, so I think pipeline scripts/notebooks we provide will need to include a list of the necessary file names to grab in the call to get_data, no?

OliviaLynn added the question Further information is requested label Jul 2, 2023

OliviaLynn mentioned this issue Jul 2, 2023

get data as a cli script #20

Merged

8 tasks

Provide feedback