Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model/dataset cache to GH actions #26

Merged
merged 5 commits into from
Feb 13, 2024
Merged

Add model/dataset cache to GH actions #26

merged 5 commits into from
Feb 13, 2024

Conversation

jaidhyani
Copy link
Collaborator

@jaidhyani jaidhyani commented Feb 9, 2024

This PR enables caching for huggingface models/datasets (#1). It does require users to manually increment a cache identifier when we want to change what's going into the cache, since I couldn't think of a good way to automatically check what will be needed in the pipeline ahead of time. Instructions have been added to README.md.

@jaidhyani jaidhyani requested a review from jettjaniak February 9, 2024 17:12
@jettjaniak
Copy link
Contributor

Amazing, thanks! Can you cache dependencies as well? Currently they're the bottleneck I believe.

@jettjaniak
Copy link
Contributor

bizarre that README got formatted by something. Did you use something else than text editor?

@jaidhyani
Copy link
Collaborator Author

Ah, I think it was Prettier. I've disabled it for this workspace.

@jaidhyani jaidhyani linked an issue Feb 11, 2024 that may be closed by this pull request
@jaidhyani
Copy link
Collaborator Author

Amazing, thanks! Can you cache dependencies as well? Currently they're the bottleneck I believe.

Added via the integrated caching functionality in setup-python. This should just work, testing it now.

@jaidhyani
Copy link
Collaborator Author

So good news/bad news:

Good news: This does work, pip packages are saved and don't need to be downloaded again. First run writes to cache, second run reads from cache.

Bad news: This only saves about 30 seconds, most of the time is spent on the actual install

Good news: It's theoretically possible to save the post-install state, see https://archive.ph/WAex2

Bad news: This potentially breaks a lot of integrations by Doing Things In An Unexpected Way. And discussion of the relevant open issue on the action itself suggests that people haven't really gotten too much benefit out of it: actions/setup-python#330

Tentative conclusion: Take the small win and revisit later if and only if this is enough of a blocker that we're willing to dedicate a lot of person-hours to improving it instead of working on something else.

Copy link
Contributor

@jettjaniak jettjaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with the conclusion, thanks!

@jaidhyani jaidhyani merged commit c72d4aa into main Feb 13, 2024
1 check passed
@jaidhyani jaidhyani deleted the issue1 branch March 20, 2024 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use cache in github actions
2 participants