-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad object refs on environment git repositories #1399
Comments
We experienced this too with version 4.0.2 and 4.1.0. In PR #1371 |
I'm on r10k 4.1.0 and get these error, too.
It's happening always, when developers do some cleanup in git history of their private branches.
I'm able to mitigate this manually. But this is somehow clumsy when r10k is call via webhook. |
I looked into this and I believe the root cause is that our thin environment repos use the the cache repo as a "reference" or shared repo. This means the thin repo is essentially a working tree pointing into specific objects held in the cache repo. Sometimes when the upstream is rebased and pulled down to the cache repo (which is a mirror of the upstream) and garbage collection happens the objects that the thin repository point to will no longer exist. See https://git-scm.com/docs/git-clone#:~:text=NOTE%3A%20this%20is,will%20become%20corrupt I'm not sure how to best fix that issue, as doing anything besides sharing with the cache repo will be slower and use more space on disk. |
I believe @bastelfreak mentioned this elsewhere. But changing the refs that tags point to isn't a git best practice and not supported by r10k. Let us know the use case where branches don't work for you. I have a feeling that a lot of confusion comes from the fact that docker doesn't have "branches" like git does, instead it implements a branch-like feature it regrettably calls "tags". |
I agree with @justinstoller here. And my comment was at #1371 (comment) |
@justinstoller just to confirm, the |
@nabertrand I believe the issue is with the rebasing. Let me give a more detailed rundown of how I think r10k and git are interacting: I have a git repo hosted on github, lets call it "upstream1" with a production branch and in git's object database it has this commit information for the production branch:
R10k pulls down a full mirror of "upstream1" in its cache dir. Lets call that repo "cache1" and it contains an exact match of all the git data in "upstream1". R10k then creates a repo "production" and tells it to reference the git data in "cache1". This repo "production" is treated essentially as a copy-on-write clone of "cache1". It contains a worktree checked out at commit d but its git dir simply says "I'm a repo pointing to commit d in cache1". On my dev machine I rebase my local history, collapsing commit c and commit d into one new commit with a better commit message. Git will now treat that as a new commit, commit e. I force push that to "upstream1" and now upstream1's git object database looks like this:
Commit c and commit d still live in the git object database but they are unreachable by users. At some point before deploying with r10k a git gc is ran. It can occur after a commit so lets say I add commit f on top of my rebased branch and trigger a gc which cleans up the orphaned git objects. Now my git objects look more like this:
And commit c and commit d have been garbage collected. Now I do a deploy and "cache1" is updated to look exactly like "upstream1". Then r10k sees that the deployed commit of production is commit d and the head of cache1's production branch is commit f. R10k asks the production repo to update to commit f. It does so saying to cache1, "I'm on commit d, send me the information on how to go from commit d to commit f." To which cache1 says, "I don't know anything about commit d, I can't tell you how to get to commit f" and the process fails with something about "did not send all necessary objects". In a full clone the production repo would return and say, "well I believe the parent of commit d is commit c, do you know about that?" and then "the parent of commit c is commit b, do you know about that?". THEN, the repo would be able to construct a path from commit d to commit f: drop d, drop c, apply e, apply f. I think if we did a shallow clone of depth 1 we'd have the same issue in reconciling git histories, however we'd also have copied all the git data from cache1 to production. So same issue but slower and more disk space. However, if we did a shallow clone of depth 3 in the above case, we'd have cloned the last three commits (commit d, c, and b) to the production repo and been able to reconcile our commit histories. So, I do think there's a way to solve the issue. But we'd need to shallow clone of depth X where X is more commits than most folks' would rebase away. And it would make r10k take up more space and run slower for everyone, which I'm not sure is a good tradeoff. |
Thanks for looking into this @justinstoller. What about allowing the user to customize how often loose objects and reflog entries are garbage collected? The performance and space hit of bumping the thresholds might be unacceptable/unnecessary for some sites, so making it customizable could allow users to tune the values as needed. Specifically, we might want to customize:
|
I think that'd be a good idea. Is your environment a FOSS or PE install? I don't see us calling git-gc within r10k, but I do know we do some gc in PE via other tools. Before we add options to r10k it would be nice to validate them. If your environment is a FOSS install and reproduces this fairly regularly could you try setting some of these values in the gitconfig for the user r10k runs as? The defaults they list seem fairly benign (eg cleaning up unreachable commits more than 30 days old) but there may be some interaction there causing issues. |
@justinstoller our environment is a FOSS install, but I thought perhaps the garbage collection was happening automatically when other non-gc git commands were run. I'd be glad to test this out, but won't have time until after the US holiday break. Could you re-open this issue? I think it was closed automatically when #1410 was merged. |
@justinstoller I'm currently testing setting |
That testing strategy sounds great. I think we should go ahead and do a release very soon while you continue with the testing. I've actually been validating another PR that came in this week #1412 and cleaning up our internal CI. We don't do releases on Fridays and I've been hesitant to release on a Thursday. But I've gotten everything through CI and I'm inclined to do a release later today since we've been putting it off for so long unless folks have concerns. |
Hello
We use r10k to create puppet environment based on an active git repository.
Sysadmins tend to create feature-branch (and do push-force in their dev environment).
Describe the Bug
Some environment (/etc/puppet/code/environments/dev_XXX) may be "stuck", git operations fail with something like:
I encountered two types of issues:
r10k/lib/r10k/git/shellgit/thin_repository.rb
Line 33 in 99505c6
Expected Behavior
On environment repositories (/etc/puppet/code/environments), maybe r10k should not do a "git fetch cache" as we just need code for a specific branch.
Steps to Reproduce
I think my issue is a race condition on active repository (aka concurrent r10k environment deploy) with people issuing "git push --force" on branches
Environment
The text was updated successfully, but these errors were encountered: