Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggeted - allowing cache mechanism for files #777

Open
guruyaya opened this issue Aug 9, 2023 · 1 comment
Open

Suggeted - allowing cache mechanism for files #777

guruyaya opened this issue Aug 9, 2023 · 1 comment

Comments

@guruyaya
Copy link

guruyaya commented Aug 9, 2023

Getting files from remote servers, can take quite some time. It may be benifitial to add an option to hold cache for remote requests. I'd suggest adding a cache_file and/or cache_dir to a smart_open, whether it's write or read. If it's a write. cache_file param will save the smart open locally with the file. cache_dir will use the sha256 of the resource as a name of the locally saved file. By default, if writing fails, it should just send a warning. It should also contain a cleanup date.
Writing a file, will be applied to both the remote and local resource. If the file exists, it will be deleted, to avoid permission issues. Reading a file will first look in the cache folder. If found, it will look at the creation date, and if it's too old, it will be deleted, and replaced with the remote file.

@stuaxo
Copy link

stuaxo commented Dec 11, 2024

Some kinds of remote servers (such as s3) will support ETAG, it would make sense to store the ETAG too.

You can just send an http HEAD request to get info such as the etag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants