-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(models): add output_dir option to model_download #178
base: main
Are you sure you want to change the base?
feat(models): add output_dir option to model_download #178
Conversation
src/kagglehub/models.py
Outdated
"""Download model files. | ||
|
||
Args: | ||
handle: (string) the model handle. | ||
path: (string) Optional path to a file within the model bundle. | ||
force_download: (bool) Optional flag to force download a model, even if it's cached. | ||
|
||
output_dir: (str) Optional path to copy model files to after successful download. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should integrate the default path here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosbo - The parameters for this fn is starting to grow. Any thoughts on managing this? Should we use wrap this in a settings/configuration object instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just my nitpick opinion: I don't think the explicit default path here is necessary since it circumvents the general cache behavior, which is encapsulated in the http resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want this method to be easy to use and I feel like adding a settings object would complicate it more than anything. Users will use keyword argument for these and likely set no parameters or at most 1-2.
If we compare to huggingface_hub.snapshot_download(...)
, we don't have many arguments... https://github.com/huggingface/huggingface_hub/blob/4011b5a2836d7bb036d8da54ed656f88bc0d2f7f/src/huggingface_hub/_snapshot_download.py#L21-L45
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg.
"""Download model files. | ||
|
||
Args: | ||
handle: (string) the model handle. | ||
path: (string) Optional path to a file within the model bundle. | ||
force_download: (bool) Optional flag to force download a model, even if it's cached. | ||
|
||
output_dir: (string) Optional path to copy model files to after successful download. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the user's expectation would be that files are downloaded directly to this output_dir and the cache folder is skipped entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BUG=b/377340841 (for googlers)
Adds support for an
output_dir
argument, similar to HuggingFace Hub'slocal_dir
.This change is primarily motivated by this torchtune RFC: pytorch/torchtune#1852 (comment)
While we could use a stopgap solution and copy files within torchtune, this behavior is general enough that direct
kagglehub
users could benefit from it.