-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate large files from git history #5
Comments
Q: several of those large files are graphics useful for examples and
documentation.
How can I leave them in, say, the online readme without requiring readers
to download them?
…On Tue, Jul 2, 2019, 2:03 PM Richard Barnes ***@***.***> wrote:
The repo contains a number of large files that you likely wanted to ignore
- the largest are listed below. This collectively means that the repo is a
100MB download.
41e6f427c11b 7.7MiB analysis/output_files/ALT_DATA2_OUT/fft/fft_results.gif
a8267f9be190 7.9MiB analysis/output_files/results_1/xcor/cross-correlations.txt
e271bcab6381 11MiB analysis/output_files/results_1/fft/fft_results.gif
669261e09a05 21MiB analysis/output_data/ALT_DATA1_OUT/xcor/cross-correlations.txt
36cbe3d82cf2 36MiB scripts/core.45511
4ac01836f00a 36MiB scripts/core.53132
9c2bb6f1759f 36MiB scripts/core.171982
a6cecc16b57b 57MiB analysis/output_data/ALT_DATA1_OUT/fft/fft_analysis_animation.gif
6def6506d3f7 66MiB scripts/GENESIS.log
these can be removed using the BFG repo cleaner
<https://rtyley.github.io/bfg-repo-cleaner/> using the following commands:
git clone --mirror https://github.com/kellykochanski/rescal-snow.git
java -jar ~/Downloads/bfg-1.12.13.jar --delete-folders 'output_files' rescal-snow.git
java -jar ~/Downloads/bfg-1.12.13.jar --delete-folders 'output_data' rescal-snow.git
java -jar ~/Downloads/bfg-1.12.13.jar --delete-files 'core.*' rescal-snow.git
java -jar ~/Downloads/bfg-1.12.13.jar --delete-files 'GENESIS.log' rescal-snow.git
java -jar ~/Downloads/bfg-1.12.13.jar --delete-files '*.o' rescal-snow.git
java -jar ~/Downloads/bfg-1.12.13.jar --delete-files '*.py~' rescal-snow.git
#Perhaps the `scripts/DUN.csp` file is also a temporary? It takes up 10MB.
after which you should check to make sure things look alright and then
cd rescal-snow.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
The upside is that this reduces the repo size to either 11MB (with DUN.csp)
or (1MB without DUN.csp), which saves bandwidth and space for users.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AEAG2VQ7JPHN4GNEH6LU6ULP5O7BLA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G47OPHQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEAG2VRRZRN3BK2NXLE25NLP5O7BLANCNFSM4H47Y4BA>
.
|
They must be in the repo to appear in the readme, unless you host them elsewhere. However, none of the files I've suggested purging (I don't think) are currently used by the repo. These are (I think) all large files that were mistakenly committed in the past. Removing from the repo using The files you show on the readme are stored in |
@kellykochanski: I thought we were fixing this prior to JOSS? |
I haven't had time to get to it, and don't want to rush into messing with the git history. |
Okay. Can we chat about it prior to JOSS acceptance?
…On Sat, Sep 21, 2019 at 11:46 AM Kelly Kochanski ***@***.***> wrote:
@rbarnes <https://github.com/rbarnes> I haven't had time to get to it,
and don't want to rush into messing with the git history.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAXZHVDU3RHR46HCVAT4T5DQKZTYDA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IXPZA#issuecomment-533821412>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXZHVHTY2DMUWO6PNBGOTDQKZTYDANCNFSM4H47Y4BA>
.
|
@r-barnes I used bfg as you suggested, and the repo is now 14MB (including the removal of DUN.csp - I think some additional docs with figures have been added since you opened this). |
Doing this before merging outstanding PRs could make doing so impossible or
difficult...
…On Thu, 26 Sep 2019, 08:10 Kelly Kochanski, ***@***.***> wrote:
@r-barnes <https://github.com/r-barnes> I used bfg as you suggested, and
the repo is now 14MB (including the removal of DUN.csp - I think some
additional docs with figures have been added since you opened this).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAXZHVDOEDFP775Z3SGCXX3QLTGIDA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7V5SCQ#issuecomment-535550218>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXZHVHN77VO5MCTRD55HRLQLTGIDANCNFSM4H47Y4BA>
.
|
bfg warned me... Any issue with just repeating the bfg calls after accepting the PRs? |
I just went through a similar process with another repository, although the issue was more related to pruning & relocating sensitive information prior to open-sourcing a software package. I discovered that GitHub has write protected refs for PRs. This means that you cannot prune data from these by default. However, I think I have special settings in my git config to fetch these PR refs that most users do not have, so this may not be a real issue (at least not if you're only concerned about repo file size; it certainly is when you're removing sensitive info). If it turns out that the PR refs keep the repository size bloated, then, the only solutions are either:
Hopefully you won't need to do either and the PR refs won't much this up for you. |
@zbeekman: Cool idea! So that cleans the while repo and associated PRs all
at once?
…On Thu, 26 Sep 2019, 08:36 zbeekman, ***@***.***> wrote:
I just went through a similar process with another repository, although
the issue was more related to pruning & relocating sensitive information
prior to open-sourcing a software package. I discovered that GitHub has
write protected refs for PRs. This means that you cannot prune data from
these by default.
However, I think I have special settings in my git config to fetch these
PR refs that most users do not have, so this may not be a real issue (at
least not if you're only concerned about repo file size; it certainly is
when you're removing sensitive info).
If it turns out that the PR refs keep the repository size bloated, then,
the only solutions are either:
1. Contacting GitHub support and asking them to delete the old PR refs
(I'm not sure if they can/will do this for you)
2. Deleting and recreating the repository.
Hopefully you won't need to do either and the PR refs won't much this up
for you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAXZHVB7VWN4OW6PL664WWLQLTJJNA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WAMWY#issuecomment-535561819>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXZHVCPIT2PHSKVKCGTZYLQLTJJNANCNFSM4H47Y4BA>
.
|
Not 100% sure what you're talking about here. If it's my point 2. "Deleteing and recreating the repository" then I need to explain a little bit further: What I really mean, is:
I would not recommend this, unless the repo size stays large after a normal pass with BFG. Even then, it's much easier to contact GitHub support and ask if they can delete the old protected PR refs. I had to go through this procedure because I realized that upon open sourcing a repository, you could still access old PR refs which included the sensitive information that cannot be made public. If you do not need to do it, then please don't. Also, if you haven't run BFG yet to prune history, you may want to do it either before the final submission or not at all; I'm not sure if it will mess with JOSS' machinery, DOI process, etc. and it will certainly affect tagging. |
@zbeekman I ran bfg on the repository, though the changes were rejected from the then-open PR on kk/JOSS-fixes. Downloading rescal-snow is now down to 14MB from ~100MB. I expect to have all open PRs closed at the time of JOSS acceptance, and will re-run bfg then - I can do this after finishing the corrections in your review, and merging the kk/JOSS-fixes branch, but before formal JOSS acceptance. I hope bfg will work smoothly if all PRs are closed... Let me know if you think that it won't. |
@kellykochanski: Yes it should work fine. IMO, you have images and stuff for the tutorials, and 14MB is probably how much space everything you want to keep takes up. But at the end of the day, I wouldn't bother with any steps that are more complicated than what you are doing. If you get complaints about rejected refs when you try to push due to PR refs, you can just delete them locally then try pushing again. (They will persist on the GitHub side, but I suspect this is fine and most people don't fetch them.) |
@zbeekman: The issue is that the repo's history contains ~86MB worth of
large temporary and output files which we accidentally committed and later
removed.
…On Thu, Sep 26, 2019 at 11:26 AM zbeekman ***@***.***> wrote:
@kellykochanski <https://github.com/kellykochanski>: Yes it should work
fine. IMO, you have images and stuff for the tutorials, and 14MB is
probably how much space everything you want to keep takes up. But at the
end of the day, I wouldn't bother with any steps that are more complicated
than what you are doing. If you get complaints about rejected refs when you
try to push due to PR refs, you can just delete them locally then try
pushing again. (They will persist on the GitHub side, but I suspect this is
fine and most people don't fetch them.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAXZHVFP7VHEPQI265U3U6LQLT5ETA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WQYKI#issuecomment-535628841>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXZHVETHTUQVOMZKOS6QRLQLT5ETANCNFSM4H47Y4BA>
.
|
[Edited for improved clarity 🤞] @r-barnes: I'll pipe down and let you guys figure out what you want to do. My point was that it sounds like Kelly had success with BFG and got things down to 14MB. Deleting the entire github repository and re-creating it is (hopefully) beyond the scope of what you want/need to accomplish. At any rate, sorry for the confusion and feel free to ignore my previous comments. If you run into troubles pushing back up to github after running BFG, let me know, it might be the PR refs issue, and I may know the solution. Either way I'd happily take a look. |
@zbeekman: No worries, thanks for your help.
…On Thu, Sep 26, 2019 at 12:08 PM zbeekman ***@***.***> wrote:
@r-barnes <https://github.com/r-barnes>: I'll pipe down and let you guys
figure out what you want to do. My point was that *it sounds like Kelly
had success with BFG* and *got things down to 14MB*, and *deleting* the
*entire* github repository and re-creating it is (hopefully) beyond the
scope of what you want/need to accomplish. At any rate, sorry for the
confusion and feel free to ignore my previous comments.
If you run into troubles pushing back up to github after running BFG, let
me know, it might be the PR refs issue, and I may know the solution. Either
way I'd happily take a look.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAXZHVGJIH2XU6SEAX5F36TQLUCCBA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WUTTQ#issuecomment-535644622>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXZHVBDD7OROUVWDDKHJHLQLUCCBANCNFSM4H47Y4BA>
.
|
The repo contains a number of large files that you likely wanted to ignore - the largest are listed below. This collectively means that the repo is a 100MB download.
these can be removed using the BFG repo cleaner using the following commands:
after which you should check to make sure things look alright and then
The upside is that this reduces the repo size to either 11MB (with
DUN.csp
) or (1MB withoutDUN.csp
), which saves bandwidth and space for users.The text was updated successfully, but these errors were encountered: