-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cached data for tests? #164
Comments
if you have ideas on how to do it, feel free to make a PR |
I have some. @tfardet do you want to do it or do I assign myself? |
Haven't had the time to work on it yet (and leaving for 15d). @tfardet if you have the time to give it a try before I'm back, don't hesitate :-) |
Ok, I think my solution is working (on local machine for now). I use mockups of I had some trouble due to multiprocessing being used in I have some questions:
|
@hadrilec just one more question. For testing purposes, I'm caching results by monkeypatching Is there a technical reason to have both? Do you think this would cause trouble to patch |
I think either you or @tfardet modified In a nutshell, if I understood well how this works, I would say that yes indeed we could remove EDIT: After a second thought, what is written above is not accurate. Sorry for this. As |
@linogaliana or @avouacr could you please tell us what is the kind of github account Insee has? The documentation of pynsee relies on jupyter notebooks that are a bit heavy. @RLesur complained about it in #30. Consequently, @tgrandje I guess the account is quite limited in terms of storage. To be confirmed by @avouacr. For the caching timeout, I think 15 days is ok. |
@hadrilec @tgrandje We use the GitHub Free plan for each of our organizations, which is obviously pretty limited by default. But for various purposes (mainly Onyxia related storage), we buy each year a data pack of Git LFS, which means we have 50GB of LFS storage and 600GB of bandwidth per 15d. But, there is no warranty that we'll continue to buy this storage long-term. A more sustainable solution could be to use the S3/MinIO storage of the SSP Cloud to store your caching DB, and put it in public so as to let the package/CI job access it via a public URL. I think @linogaliana uses this strategy for the cartiflette package. |
Good idea @avouacr. I'll try to fix something : with everything in cache, I've just ended with 1.7 Go of SQlites. Do you think this can be deployed as it is (until I manage to work with MinIO) ? Can you also confirm that you can upload to the public MinIO without the (daily) tokens ? For |
+1 for the S3 solution @tgrandje It is possible to create service account that would not expire and would be provided to Github Actions secrets. We are not doing that in |
@tgrandje I think it would be preferable to work with MinIO directly as putting large files in a git project can end up being tedious to manage. Maybe you could start a PR with the current state of you work on caching and I could help setting up the communication with MinIO ? |
Yeah: @tgrandje if the "deployed as it is" in your comment meant "tracked as files in git", then I really suggest to find another way as this will be extremely impractical to manage |
What I meant was storing files through github actions' artifacts cache, not through git. That might have been practical enough and might have reduced data transfer, but I understand the cost is not trivial (as a matter of fact, I haven't found how to evaluate it on their calculator). The current branch is here. I first started to split SQLites between modules as I thought the 500Mo displayed on github actions' threshold displayed in the doc was per file: it can easily be re-merged (I'll do that asap) to simplify the exchange with MinIO. It should also be easy to add a try/except in a |
Right now tests are taking a lot of time and probably using a lot of bandwidth downloading everything.
It's also a problem when servers are down (like at the moment).
Suggestion:
Pros: faster tests and lower impact on servers, do not fail tests that are irrelevant to PRs
Cons: need to check manually that relevant tests ran properly for relevant PRs
The text was updated successfully, but these errors were encountered: