-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-level anchoring modes #3
Comments
“Above small team that also wants to be able to squash and amend comments”Suggestion: add “without losing signatures that were done beforehand”. Also, in the drawbacks, the one you listed is kind of laughable. Much more important is the possibility of replay attacks where the attacker can push a new commit (so no history rewriting) to make the tree back to what it was before at a vulnerable version, and re-use signatures from there (esp. given the “with” of the current RFC will not help you, as you're signing a tree anyway). The only countermeasure to reduce the impact of this vulnerability I've heard is relying on time to check the signature… and time is a well-known offender. Most systems synchronize through NTP servers, which can be used to arbitrarily slow down or in certain cases even revert the passage of time. Also, even without considering that, the question of what is a reasonable time limit is hard to do. So the time-based “protection” sounds like a very very fragile protection to me: I think we should, in the threat model, assume that tree-id signing does allow replay attacks. Finally, a tree-id signature can be only be signing a state, not a diff (taking terms from the current RFC): the tree-id doesn't hash the previous commit-id. Which means tree-id signatures should be much harder to do than commit-id signatures. With these points in mind, do you still think that it'd be a better trade-off for a small team that wants to be able to squash and amend comments, rather than simply re-signing after the squash/amend? I'm thinking the cost of having to re-review all the code at each signature would be a lot higher than the cost of re-signing. And, if signing by only checking one's review and trusting previous reviews, then an attack scheme may be:
Now, if you still believe this is some possibility you want to offer your users (I personally consider it just like GnuPG's interface streaming un-checked data and only adding a message at the end if an attack is detected: a high risk to security, because it's a dangerous interface), then I think it should come with the following points in the RFC:
“An organization with many in-flight diffs and reviews merging out of order”So… giving it more thought, I think:
If it's added to the RFC, I think it should be done with the following provisions:
These requirements are enough to implement your CI system where maintainers take care of ensuring correct ordering of the patch: then, maintainers would be the one doing the “authoritative for verification” signature, but they would keep the patch-id signature trail for forensics in case a backdoor is ever discovered. Also, it avoids having this section about “you are knowingly creating a security hole and are a terrible person”: the user cannot create a security hole, as no tools would exist to automate verification @oxij I've quite reduced my opposition in this message compared to previous times. I think with the provisions I added to the end of my paragraphs to the RFC it should be relatively secure (ironically, I'm more afraid of tree-id signing than of patch-id signing with these restrictions, as patch-id signing would be well-limited by the tool). You maybe have an opinion stronger than me on this question? |
@Ekleog your arguments on tree-id make sense to me based on my current limited understanding of tree-id. Personally I think the "patch-id + tagging signed commits to lock in order" flow makes the most sense for my use cases. I did attempt to see if tree-id was defensible but perhaps it is not. I don't actually the idea of mutating comments and squashing without changing the signature as I feel like it would, if nothing else, introduce social engineering vectors. patch-id+tagging flow where authors are saked to squash patches before review (if required by the maintainers) seems like a reasonable ask, and still allows the flexibility to have multiple in-flight diffs. I don't have further arguments for including tree-id and I don't think I would want to use it with the tradeoffs it poses over commit/patch signing for the orgs I support. If @daurnimator (who introduced the potential of tree-id in the first place) can come back with a solid use case for it in spite of these concerns, then your provisions also makes sense. By the way, you may of noticed about me by now that I will play devils advocate for just about anything to encourage deeply thinking about problems. My own true stances are often much less defined at the outset, and debate is often the very thing that helps me define them. Well argued! :) |
When you review/build code, you review the current state: you don't care what it was before. Signing a tree is the closest analogue to what most people actually review. I suggested it due to two reasons:
|
@daurnimator I think the Reviewed-By tag is only tagging the diff of the commit, in the context? At least that's how I've seen it used. For instance, in nixpkgs (or other similarly million+-lines projects), it's impossible to imagine reviewing the whole state, because if you do so then by the time the review is finished a big part of it has already changed. That's why we switched the idea to having an initial Now, if I understand correctly you're not strongly arguing in favor of tree-id, so that's just for the explanation of the reason why I think tree-id wouldn't be that useful. :) @lrvick I also tend to playing the devil's advocate for the same reason, though not that much in these discussions… so I completely understand you :) So I think if @oxij agrees, we could go this route for the commit-id, tree-id and patch-id signatures:
|
-totally- separate command seems ambiguious here. I was thinking "git signatures show" could show all signatures and what type they are at a glance, and that "git signatures verify" would only verify against commit signing unless you pass it an explicit flag to verify patch-ids with strict warnings on intended use cases around this in the documentation. |
Well, my point was actually that
This would be rendered unsafe by patch-id, and no amounts of warnings will prevent users from doing it anyway (see Efail, the unsafe behaviour was documented, yet almost all clients handled it wrong) However, patch-id signatures MAY be verified securely so long as the user cannot expect it to verify more than what it can verify: by having a I think this command is enough to implement your “An organization with many in-flight diffs and reviews merging out of order” model, and in addition it would be safer than with the full-blown |
Before we discuss semantics can we discuss use-cases first? What are the use cases for anything but commit-id signing?
### Small team willing to allow only one "ready to merge" changeset at a time
Use "commit" level anchoring and truck on.
SGTM.
### Above small team that also wants to be able to squash and amend comments
...
Why change anything here? IMHO both author and committer should agree on the result, hence if someone squashes and you agree you just sign the result, if you disagree, you complain (out of band or by signing a "-" review).
### An organization with many in-flight diffs and reviews merging out of order
...
Why change anything here?
I think "I squashed all of this in <hash>, please review and sign if it's ok" should be the way to do this anyway.
Moreover, if nobody squashes/rebases anything all your signatures attached to your commits stay valid on merges. And if you don't like the merge itself you should complain at the merge commit.
In short, I agree that tree-id is both dangerous and useless and patch-id signing can be dangerous but is somewhat less useless if you always rebase instead of merging. But in a rebase-workflow case I think authors should just resign the result. Why can't authors just resign the result?
Sure, we can have a tool that does the checking by patch-id and resigning automatically signer-side comparing by patch-ids, but I'm medium-level against doing it verifier-side as we then would need to agree and encode in patch-id:
- patch format used for computing patch-ids,
- diff algorithm used to generate patches,
- diff algorithm parameters like context size and etc.
As verifier would need to regenerate the patch to generate patch-id for comparison.
This is much less stable than a simple hash of history state + tree state. Commit-id signing will survive _any_ changes to any git algorithms and data formats, oid formats, etc since all it does is signs a hash (or a bunch of them together) of a persistent data structure. These signatures can even survive git itself if verifier could reconsturuct current git hashing for another VCS. Meanwhile <patch-id> signing is very this-git-version-diff-tree-called-with-these-parameters-specific.
What does make sense to me, however, is signing a diff between a pair of tree-ids. I.e. like a signed commit, but without a commit message.
|
I can think of a use case for this: when authors don't have access to the full repository their commits are being rebased into, for $corporatereasons. Now, thinking about it more… patch-id signing as I thought it (ie. as a paper trail for maintainers down the road to prove they didn't invent the patches themselves) can actually be implemented through commit-id signing:
Now, maintainer's commits are technically signed only by maintainer, but if any problem is found in them, maintainer can be taken as responsible for it until they produce the review proof they have from author and reviewer. Some (site-specific?) tooling could in addition link previous review proofs and maybe automatically verify them and check the patch-id didn't change, but that doesn't require direct patch-id signing (thanks @oxij for the demonstration of stability issues of it :)) |
* Rebases, squashes or whatever it wants the patch to master, and signs the rebase
And then the author can sign it too and maintainer can copy that signature into upstream's review-branch, if desired. This is allowed because dates are "serial numbers".
Some (site-specific?) tooling could in addition link previous review proofs and maybe automatically verify them and check the patch-id didn't change.
Yes, also maintainers can sign cherry-picks by other maintainers if the original commit has the same patch-id as the cherry-pick and the original was already signed. E.g. you can cache patch-id of commits you signed and then run something like `git wotr resign-cherry-picks HEAD~20..HEAD` and it would do that.
|
Actually, this resigning could even be done by CI systems. So the idea of replacing patch-id by commit-id + keep reference to commit & parent sounds good to me, as tooling can make the two just as easy to use (well, so long as you're not emailing patches around, but if you're emailing patches around then patch-id signing wouldn't help you anyway, as it's stored in an obscure blob with an obscure format in an obscure reference of the repository… just sign the mail :)) |
I think this adds a lot of complexity, and patch-id should be very stable now and the output with the flags used in git-signatures should never change. Unless I am missing something this requires that a maintainer keep all their proofs around by hand and any feature branches be preserved. This makes it very difficult for there to be a first-class tool to show the full paper-trail for a given commit in master at a glance. In practice I feel this means a maintainer will just have exclusive trust for master. Auditing should be -easy- and be able to be done by anyone, not just the maintainer which may be hit by a bus. Also consider popular (but admittedly shitty) review tools like Phabricator that fully detach a patch and submit it to a database for review. They are then later applied via "arc land" by a maintainer often to a totally different repo for, as stated, $corporatereasons. git-wotr signatures could be attached and re-imported into the new repo. Also consider all the many open source projects that actually only accept patches by email. A signature on the email can't be easily imported into the upstream repo. A patch attachment with sibling attachment containing a one-line git-wotr signature that contains a patch-id could. It is also possible I am totally misunderstanding your suggestion here. If so then please provide some specific examples so my mind can grock this :) |
@lrvick So what I'm saying is: A patch-id is a restricted commit-id, with less determinism. In your two examples, you transfer a patch as well as a patch-id signature. What I'm saying is, you can replace a patch-id signature by some blob a bit bigger (handled by your tool) that's composed of:
With these informations, you actually convey exactly as much information as a patch-id signature, except:
In exchange, the tarball of the tree before the commit must be exchanged in some way… but anyway this will likely be a commit on |
Your stated strategy aready assumes we need a stable diff command:
The explicit way I am using patch-id and diff-tree (As suggested by @rafasc) can be reasonably expected to never change:
This is dramatically simpler than the very large and complex blob, and complex programming logic that would be needed to avoid it. And yes there are plenty of reasons one might want to ship a valid patch to many different projects. Say for instance re-writing a core crypto function that impacts 100 different bitcoin forks. Maintainers of each fork pull in that upstream patch and consumers have strong assurance the maintainer didn't make it up. Deciding the -order- it is merged is still on the maintainer. They can just prove where the patch originated and perhaps reference any testing/research/review that went with it on top of a similar codebase for performance or whatever else. As discussed elsewhere: patch-id -is- a lower verification level than commit-id but the most critical use case it supports is allowing multiple in-flight patches to be merged into a tree where the order is determined by the maintainers. The maintainers can pull in patches and responsible for making sure they are merged in a safe order. The authors and reviewers will of course be signing commit-ids as well, and that will allow maintainers to know the context they reviewed under and consider that in their choice of merge order. Patch-ids are useful for trust between various stages of a release process but only signed object ids on master or tagged releases should generally be trusted by downstream consumers as only that can lock in the order. My current understanding of your proposal is that it adds significant complexity and throws out the possibility of having multiple PRs all staged at once for merging by maintainers in any order of their choosing. This is impractical for almost every company feature-branch workflow I am aware of. |
No it doesn't. It just assumes that you can apply an old diff: the diff is here only as an optimization to not store the two full trees. The two full trees, once generated using the stored diff, can then be diff'd with the now-current Basically I'm not storing the patch-id, but the full diff.
You disabled git/git@33de716 . Right. So you're missing git/git@5580b27 . And likely others I don't know about. And more that will come in the future. I think when even patch-id's documentation only claims “relatively stable” (which is, assuming identical files as the input, so not even considering
Well, for this use case it's really just a
Said complexity is only when you move patches around in a way that is not using git, ie. phabricator or email. When you have direct git access, the tool can just automatically fetch/push the hash referenced by And even then, it is only complexity incurred by the tooling wishing to support this use case, as the inputs and outputs will be the same: a patch and a signature on said patch. Except the fact that the patch signature based on commit-id will be more robust, at the expense of being more heavy in case the commit on which the patch has been initially reviewed isn't publicly accessible (in which case it'd have to add a tree tarball to the signature to be secure) -- I assume the weight of the diff itself would be negligible in most cases. However I'm not sure I explained my point clearly: I'm not saying you should drop the idea of signing patches. I guess some specific use case could require it, even though I'm not willing to support these in the tool I'm expecting to write unless I get paid for it. What I am saying is that you can sign patches by signing their commit-id with a regular The advantages are:
|
I guess I am having a hard time following the exact setup you are recommending here, so I'll take a stab at trying to translate into pseudocode to see if I am following. Lets say I make a PR against the current HEAD of master. I append a note signature to the HEAD commit of my branch containing something like:
Where $rebase-commit-id was the current HEAD this diff was signed against. From there a several other people make similar PRs in feature branches. One of these totally unrelated changes merges ahead of mine thus making every other signed PR/diff including mine no longer valid against the current head of master. You as maintainer in order to verify my commit would do something like:
A downstream consumer of the repo could then potentially verify a ref in master under a policy like:
Does this workflow example sound in line with what you are proposing? |
Some of the things I mentioned to @lrvick , I assumed this was being used on a certain context but reading this I clearly misinterpreted that context.
True,
https://github.com/mhagger/git-imerge/tree/v1.1.0 Does this and calls it rebase-with-history. But ... A rebased branch cannot be trusted just because it was made from trusted commits. Your signed and harmless commit could introduce a problem because it was applied by someone on top of something else harmless by itself. Similar situation occurs in merges and patch based approach, as you can introduce changes in the merge/apply commit itself. But at that point, the responsibility falls on the person who did the rebase/merge/apply, not the original author of the commits. Using your signature over notes extension, you could add your signature back to the upstream commits meaning: "I verify that the maintainer correctly integrated my changes". And I don't know of any good alternative without compromising something. Be it security, clean history, or ease to use. |
I'll conceed on the patch-id point if I can confirm I am understanding the alternative correctly. Clearly they are not as stable as I thought.
Yeah that is exactly the attack we want to be able to deal with. Signing commit-ids only is most secure but we need to have a secondary solution to support typical workflows with multiple in-flight diffs. Specifically we need a signing scheme that allows different parties to certify proof of authorship, proof of review, and merge order separately at different stages of the deployment lifecycle. Specfically we need to support a flow like:
Signing both the commit-id and patch-id would allow all of the above to be possible -if- patch-id were stable. I am trying to understand an alternative that still can check all these boxes. |
@rafasc I think we're pretty much in total agreement :) @lrvick The workflow you exposed unfortunately doesn't work perfectly, because the commit-id would be changed by rebasing, and may turn out not to be pushed to the maintained repository (or to be
This way of doing things even allows successive rebases to all keep traceability of who did which rebase and who assumed the other's rebase was correct. And it doesn't rely on stability of the It doesn't handle rebase conflicts, because if there is a rebase conflict then it's no longer Author's work and has no longer been reviewed by Reviewer, so these signatures should really be dropped. |
Okay. I better understand what you are proposing however it is very complex and is going to make implementation much harder than the patch-id setup. As far as I can tell there is no change in attack surface between your proposal and using patch-ids. The only reason we are doing all these gymnastics to avoid patch-ids then, is out of potential that that may not be stable. Coming at this from a different angle, lets say either patch-id is stable, or we dictate our own alternative to patch-id. For the sake of discussion lets say we put in the spec a specific deterministic diff algorithim then simply sha256 the output. Could call it "diff-id" We would then keep the signing format dirt simple by always signing the commit-id and this custom but stable diff-id. From there can use all the patch-id workflows as I outlined. TL;DR: Assume patch-id is stable or we make a drop-in stable alternative. Is this not then a better fit and easier to implement/verify for the outlined workflows? |
Okay. I better understand what you are proposing however it is very complex and is going to make implementation much harder than the patch-id setup.
Well… not really, I just made every point explicit. The changes that
impact complexity are:
* Author and Reviewer don't have to specify a flag to make a patch-id
signature (user-facing simplification)
* At step 5.3, instead of generating and signing a patch-id, Maintainer
signs some metadata trivially recovered from the signed commit
(tooling simplification)
* At step 7, instead of generating and verifying a patch-id, Consumer
patches a tree, commits it and verifies signature (tooling
complexification)
Overall, tooling would be about as complex (esp. given all its
cryptographic “security hot path” will be divided by two), and, from a
user point of view, it's a net simplification.
Also, this scheme means that the user need not willingly make patch-id
signatures: it's possible to re-use commits that people didn't design to
be rebased without losing the initial signature. (I assume when saying
this that we don't always sign the patch-id, because I think we already
agreed on the fact that it should in no way be the default behaviour.)
As far as I can tell there is no change in attack surface between your proposal and using patch-ids. The only reason we are doing all these gymnastics to avoid patch-ids then, is out of potential that that may not be stable.
Assuming the tooling is correct, and following the list I gave above,
then yes, there is no change in attack surface.
*However*, my proposal does:
* Reduce the amount of the code in the tooling that will have to handle
secrets (signing patch-ids vs. only ever signing commit-ids)
* Never sign something whose meaning is unclear (just imagine if I
asked you to sign “please transfer $5 to Ekleog” if you owed me $5…
would you do it, or would you include the date, an order number,
etc. in the signed text?)
* There are *much* more metadata kept. Sure, for automated tooling,
this won't have much use, but for a human even knowing on which
commit the patch was reviewed is already a great help. And the
metadata here also contains the commit date (included in the cat-file
output), which could be used by the tooling to forbid use of too old
patches
So, yeah, the attack surface is the same, because you asked me to do a
protocol that has the same end-result and the attack surface is inherent
to this protocol. But more secure protocols could be made with the
commit-id-based patch signing, while they couldn't be, based on patch-id
signing.
Coming at this from a different angle, lets say either patch-id is stable, or we dictate our own alternative to patch-id.
For the sake of discussion lets say we put in the spec a specific deterministic diff algorithim then simply sha256 the output. Could call it "diff-id"
If we do that then I'm 99.999% certain that the implementation of
patch-id signing is going to be *much*, much more complex than my proposal.
We would then keep the signing format dirt simple by always signing the commit-id and this custom but stable diff-id. From there can use all the patch-id workflows as I outlined.
TL;DR: Assume patch-id is stable or we make a drop-in stable alternative. Is this not then a better fit and easier to implement/verify for the outlined workflows?
If we always sign a diff-id, then yes it's possible.
But no, it wouldn't be a better fit for the workflow, because of the
drawbacks mentioned above.
And no, it's not going to be an easier fit for the workflow, because
designing our own stable diff algorithm that handles correctly rebases
and does as well as `git apply` (the only required for my scheme) is
going to be a *lot*, *lot* more complex than maybe the whole `git-wotr`
project.
|
Okay. Stewing on this and I am warming up to this idea provided I can find a simple to reason about set of steps to sign/verify by hand. I appreciate you expanding your explanations to address my use cases. I am currently most unclear on how this step would work.
If I am understanding this correctly your idea is recreate the PR branch that was signed, and make sure everything checks out in that context and that it gets the same commit hash. I am unclear how to use the cat-file -p output to recreate a commit in a branch and get the same hash. |
Yes, the idea is exactly to recreate the PR branch from the data, using the diff between As for how exactly:
Maybe there's a way to do it without messing with the staging area, by using git worktrees or similar. Now, disclaimer: I have written this in the github comment interface, meaning that I haven't security-checked it. Intuitively it should be OK, but there may be something I missed. No warranties, yadda yadda. |
This is great and gives me something concrete to play with. Thanks for taking the time.
Will circle back after some tire kicking.
…On October 24, 2018 8:00:28 PM PDT, "Léo Gaspard" ***@***.***> wrote:
Yes, the idea is exactly to recreate the PR branch from the data, using
the diff between `C-pr-new^` and `C-pr-new` (to check that the rebase
was indeed a no-op) and the metadata from the output of `git cat-file
-p`.
As for how exactly:
```
$ #
$ # SETTING THE TEST REPOSITORY UP
$ #
$ git init test
Initialized empty Git repository in /tmp/test/.git/
$ cd test/
$ touch test
$ for i in `seq 1 100`; do echo aaaa >> test; done
$ git add test
$ git commit -m "."
[master (root-commit) 40eb09d] .
1 file changed, 100 insertions(+)
create mode 100644 test
$ git checkout -b test-branch
Switched to a new branch 'test-branch'
$ sed -i '3s/aaaa/bbbb/' test
$ git commit -am "change aaaa to bbbb line 3"
[test-branch b2a250f] change aaaa to bbbb line 3
1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout master
Switched to branch 'master'
$ sed -i '50s/aaaa/cccc/' test
$ git commit -am "change aaaa to cccc line 50"
[master cd938ad] change aaaa to cccc line 50
1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout test-branch
Switched to branch 'test-branch'
$ git cat-file -p HEAD > metadata
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: change aaaa to bbbb line 3
$ cd ..
$ git clone test test2
Cloning into 'test2'...
done.
$ cd test2
$ #
$ # VERIFICATION
$ #
$ # here test2 never saw the initial test-branch
$ # now we're entering the step of verifying the validity of the commit
Branch 'master' set up to track remote branch 'master' from 'origin'.
Switched to a new branch 'master'
$ cat ../test/metadata | grep parent
parent 40eb09de44dc69e839d3b6ad0f922ae1e8ebff6a
$ git checkout 40eb09de44dc69e839d3b6ad0f922ae1e8ebff6a
Note: checking out '40eb09de44dc69e839d3b6ad0f922ae1e8ebff6a'.
[ ... skipped for brevity ... ]
$ git apply <(git diff master test-branch)
$ git add .
$ git write-tree
f565b40932ab5954dd4b43d47f026bd4c9670b9e
$ cat ../test/metadata | grep tree
tree f565b40932ab5954dd4b43d47f026bd4c9670b9e
$ # all is going good up to now (the check is important security-wise)
$ git hash-object -t commit
../test/metadatab2a250fe7648ce5dea07d50c14083f9ba8ab0353
b2a250fe7648ce5dea07d50c14083f9ba8ab0353
$ # it's the same hash than in the original branch, so we can check
signatures on it
```
Maybe there's a way to do it without messing with the staging area, by
using git worktrees or similar.
Now, disclaimer: I have written this in the github comment interface,
meaning that I haven't security-checked it. Intuitively it should be
OK, but there may be something I missed. No warranties, yadda yadda.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#3 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Story
We actually have 3 verification levels possible which each have different use cases and tradeoffs. I have taken to calling this the "anchor" status. The signature will be valid, but the anchor status will tell you at what security level it is anchored to the current tree we are validating it in.
Commit-id signing alone can not suit all use cases, so it is useful to sign all three types of anchors, and then leave it to the end client verifier to assert the trust level based on what security assumptions are true.
Trust Anchor levels
commit
tree
patch
Use Cases
Small team willing to allow only one "ready to merge" changeset at a time
Use "commit" level anchoring and truck on. It is the default and you need
special flags to verify any other way.
Above small team that also wants to be able to squash and amend comments
Consider "tree" level anchoring which offers greater flexibility.
Understand the tradeoff here is that a bad actor can squash all commits into
one and change the commit message to offensive ascii art, but can't actually
change the code.
An organization with many in-flight diffs and reviews merging out of order
Consider "patch" level anchoring for authors/reviewers.
Understand that this can only certify that a given changeset was authored and
reviewed in isolation.
You -must- pair this with a CI system or maintainer(s) that will do one of:
If you fail to do one of the latter steps to prevent history mutation you are
knowingly creating a security hole and are a terrible person.
The text was updated successfully, but these errors were encountered: