Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-level anchoring modes #3

Open
lrvick opened this issue Oct 6, 2018 · 23 comments
Open

Multi-level anchoring modes #3

lrvick opened this issue Oct 6, 2018 · 23 comments

Comments

@lrvick
Copy link

lrvick commented Oct 6, 2018

Story

We actually have 3 verification levels possible which each have different use cases and tradeoffs. I have taken to calling this the "anchor" status. The signature will be valid, but the anchor status will tell you at what security level it is anchored to the current tree we are validating it in.

Commit-id signing alone can not suit all use cases, so it is useful to sign all three types of anchors, and then leave it to the end client verifier to assert the trust level based on what security assumptions are true.

Trust Anchor levels

commit

  • Signature valid against the commit hash of the ref we are verifying
  • Signature invalid after squashing
  • Signature invalid after changing commit messages
  • Signature invalid after rebasing
  • Signature invalid if changeset is moved to a new location in this repo or another repo

tree

  • Signature valid against tree hash of the ref we are verifying
  • Signature valid after squashing
  • Signature valid after changing commit messages
  • Signature invalid after rebasing
  • Signature invalid if changeset is moved to a new location in this repo or another repo

patch

  • Signature valid against patch id hash of the changeset we are verifying
  • Signature valid after squashing
  • Signature valid after changing commit messages
  • Signature valid after rebasing
  • Signature valid if changeset is moved to a new location in this repo or another repo

Use Cases

Small team willing to allow only one "ready to merge" changeset at a time

Use "commit" level anchoring and truck on. It is the default and you need
special flags to verify any other way.

Above small team that also wants to be able to squash and amend comments

Consider "tree" level anchoring which offers greater flexibility.

Understand the tradeoff here is that a bad actor can squash all commits into
one and change the commit message to offensive ascii art, but can't actually
change the code.

An organization with many in-flight diffs and reviews merging out of order

Consider "patch" level anchoring for authors/reviewers.

Understand that this can only certify that a given changeset was authored and
reviewed in isolation.

You -must- pair this with a CI system or maintainer(s) that will do one of:

  • Do only signed merge commits to master branch
  • Sign trusted HEADs/tag with "commit" or "tree" level anchoring post-merge

If you fail to do one of the latter steps to prevent history mutation you are
knowingly creating a security hole and are a terrible person.

@Ekleog
Copy link
Contributor

Ekleog commented Oct 6, 2018

“Above small team that also wants to be able to squash and amend comments”

Suggestion: add “without losing signatures that were done beforehand”.

Also, in the drawbacks, the one you listed is kind of laughable. Much more important is the possibility of replay attacks where the attacker can push a new commit (so no history rewriting) to make the tree back to what it was before at a vulnerable version, and re-use signatures from there (esp. given the “with” of the current RFC will not help you, as you're signing a tree anyway).

The only countermeasure to reduce the impact of this vulnerability I've heard is relying on time to check the signature… and time is a well-known offender. Most systems synchronize through NTP servers, which can be used to arbitrarily slow down or in certain cases even revert the passage of time. Also, even without considering that, the question of what is a reasonable time limit is hard to do. So the time-based “protection” sounds like a very very fragile protection to me: I think we should, in the threat model, assume that tree-id signing does allow replay attacks.

Finally, a tree-id signature can be only be signing a state, not a diff (taking terms from the current RFC): the tree-id doesn't hash the previous commit-id. Which means tree-id signatures should be much harder to do than commit-id signatures.

With these points in mind, do you still think that it'd be a better trade-off for a small team that wants to be able to squash and amend comments, rather than simply re-signing after the squash/amend?

I'm thinking the cost of having to re-review all the code at each signature would be a lot higher than the cost of re-signing. And, if signing by only checking one's review and trusting previous reviews, then an attack scheme may be:

  • User A tree-id-signs backdoor-introducing commit A
  • User B tree-id-signs legitimate commit B
  • User A rewrites commit A out of the history, merging it with commit B
  • User A finds any reason to make users force-pull so that they lose commit A
  • Now, user B is seen as the one introducing the backdoor
    Forensics may discover the trick if someone who fetched between commit A being introduced and being rewritten out of the history didn't git-gc, but it still stays a potential attack. So user B would indeed have to check the whole repository (or at least all the changes since their last signature) in order to be able to securely tree-id-sign.

Now, if you still believe this is some possibility you want to offer your users (I personally consider it just like GnuPG's interface streaming un-checked data and only adding a message at the end if an attack is detected: a high risk to security, because it's a dangerous interface), then I think it should come with the following points in the RFC:

  • Implementations MAY support generating and/or verifying the tree-id signature scheme
  • Implementations that support generating the tree-id signature scheme MUST make sure that the user understands the security risks it poses, that the user will have to review the whole repository at each tree-id signature, and that not all implementations support verifying it. Such implementations SHOULD at least hide this capability behind an ominous-sounding option, like --dangerous.

“An organization with many in-flight diffs and reviews merging out of order”

So… giving it more thought, I think:

  • There is just no secure way to automatically verify a whole branch of patch-id signatures, because we don't know whether they were actually meant to go together in this series
  • So patch-id signing is really detached-signing a patch in the purest sense of the word
  • So long as it's only that… why not?

If it's added to the RFC, I think it should be done with the following provisions:

  • Implementations MAY support generating the patch-id signature scheme. If they do so, they SHOULD make sure the user is informed of the fact that not all implementations support verifying it
  • Implementations MAY support verifying the patch-id signature scheme. If they do so, they MUST NOT integrate it with the regular verification command, instead they MUST make it a separate command that is designed exclusively to check one patch-id (as opposed to the regular verification command that would be used for automating verification of a series of commit-id [or tree-id] signatures)

These requirements are enough to implement your CI system where maintainers take care of ensuring correct ordering of the patch: then, maintainers would be the one doing the “authoritative for verification” signature, but they would keep the patch-id signature trail for forensics in case a backdoor is ever discovered. Also, it avoids having this section about “you are knowingly creating a security hole and are a terrible person”: the user cannot create a security hole, as no tools would exist to automate verification

@oxij I've quite reduced my opposition in this message compared to previous times. I think with the provisions I added to the end of my paragraphs to the RFC it should be relatively secure (ironically, I'm more afraid of tree-id signing than of patch-id signing with these restrictions, as patch-id signing would be well-limited by the tool). You maybe have an opinion stronger than me on this question?

@lrvick
Copy link
Author

lrvick commented Oct 6, 2018

@Ekleog your arguments on tree-id make sense to me based on my current limited understanding of tree-id. Personally I think the "patch-id + tagging signed commits to lock in order" flow makes the most sense for my use cases.

I did attempt to see if tree-id was defensible but perhaps it is not.

I don't actually the idea of mutating comments and squashing without changing the signature as I feel like it would, if nothing else, introduce social engineering vectors. patch-id+tagging flow where authors are saked to squash patches before review (if required by the maintainers) seems like a reasonable ask, and still allows the flexibility to have multiple in-flight diffs.

I don't have further arguments for including tree-id and I don't think I would want to use it with the tradeoffs it poses over commit/patch signing for the orgs I support.

If @daurnimator (who introduced the potential of tree-id in the first place) can come back with a solid use case for it in spite of these concerns, then your provisions also makes sense.

By the way, you may of noticed about me by now that I will play devils advocate for just about anything to encourage deeply thinking about problems. My own true stances are often much less defined at the outset, and debate is often the very thing that helps me define them.

Well argued! :)

@daurnimator
Copy link

If @daurnimator (who introduced the potential of tree-id in the first place) can come back with a solid use case for it in spite of these concerns, then your provisions also makes sense.

When you review/build code, you review the current state: you don't care what it was before. Signing a tree is the closest analogue to what most people actually review.

I suggested it due to two reasons:

  1. @lrvick wanted something that would work with a squashing workflow.
  2. Many projects add a "Reviewed-By" or other metadata to commit messages.

@Ekleog
Copy link
Contributor

Ekleog commented Oct 7, 2018

@daurnimator I think the Reviewed-By tag is only tagging the diff of the commit, in the context? At least that's how I've seen it used.

For instance, in nixpkgs (or other similarly million+-lines projects), it's impossible to imagine reviewing the whole state, because if you do so then by the time the review is finished a big part of it has already changed. That's why we switched the idea to having an initial state commit-id signature that would say “everything here is safe”, and then each commit would independently be reviewed in its context, with diff commit-id signatures. The state commit-id signature is very close to a tree-id signature, but it additionally enforce the parent, which should prevent replay attacks. The diff signature is basically Reviewed-By: signing that a commit looks legitimate in the context of its parent, assuming its parent is legitimate.

Now, if I understand correctly you're not strongly arguing in favor of tree-id, so that's just for the explanation of the reason why I think tree-id wouldn't be that useful. :)

@lrvick I also tend to playing the devil's advocate for the same reason, though not that much in these discussions… so I completely understand you :)


So I think if @oxij agrees, we could go this route for the commit-id, tree-id and patch-id signatures:

  • Commit-id review is as defined in the current draft
  • Tree-id review is not authorized
  • Patch-id review is authorized with the following limitations:
    • Implementations MAY support generating the patch-id signature scheme. If they do so, they SHOULD make sure the user is informed of the fact that not all implementations support verifying it
    • Implementations MAY support verifying the patch-id signature scheme. If they do so, they MUST NOT integrate it with the regular verification command, instead they MUST make it a separate command that is designed exclusively to check one patch-id (as opposed to the regular verification command that would be used for automating verification of a series of commit-id signatures)

@lrvick
Copy link
Author

lrvick commented Oct 7, 2018

Implementations MAY support verifying the patch-id signature scheme. If they do so, they MUST NOT integrate it with the regular verification command, instead they MUST make it a separate command that is designed exclusively to check one patch-id (as opposed to the regular verification command that would be used for automating verification of a series of commit-id signatures)

-totally- separate command seems ambiguious here. I was thinking "git signatures show" could show all signatures and what type they are at a glance, and that "git signatures verify" would only verify against commit signing unless you pass it an explicit flag to verify patch-ids with strict warnings on intended use cases around this in the documentation.

@Ekleog
Copy link
Contributor

Ekleog commented Oct 7, 2018

Well, my point was actually that git signatures verify MUST NOT verify patch-id signatures. Because git signatures verify would do the following algorithm:

  • Find a state-signed commit
  • For each commit going forward, verify its commit-id signature, until the top of the branch

This would be rendered unsafe by patch-id, and no amounts of warnings will prevent users from doing it anyway (see Efail, the unsafe behaviour was documented, yet almost all clients handled it wrong)

However, patch-id signatures MAY be verified securely so long as the user cannot expect it to verify more than what it can verify: by having a git signatures verify-patch command that ONLY takes as input a range of commits, and verify that the patch-id corresponding to this range of commit has been reviewed.

I think this command is enough to implement your “An organization with many in-flight diffs and reviews merging out of order” model, and in addition it would be safer than with the full-blown verify command, as people couldn't rely on it doing safety checks it cannot do :)

@oxij
Copy link
Member

oxij commented Oct 9, 2018 via email

@Ekleog
Copy link
Contributor

Ekleog commented Oct 9, 2018

Why can't authors just resign the result?

I can think of a use case for this: when authors don't have access to the full repository their commits are being rebased into, for $corporatereasons.

Now, thinking about it more… patch-id signing as I thought it (ie. as a paper trail for maintainers down the road to prove they didn't invent the patches themselves) can actually be implemented through commit-id signing:

  • Author commits to their branch, reviews, pushes commit and review
  • Reviewer pushes review of author's commit
  • Maintainer down the road tasked with integrating incoming patches:
    • Pulls commit from author's branch
    • Check that commit is signed by both author and reviewer
    • Rebases, squashes or whatever it wants the patch to master, and signs the rebase

Now, maintainer's commits are technically signed only by maintainer, but if any problem is found in them, maintainer can be taken as responsible for it until they produce the review proof they have from author and reviewer. Some (site-specific?) tooling could in addition link previous review proofs and maybe automatically verify them and check the patch-id didn't change, but that doesn't require direct patch-id signing (thanks @oxij for the demonstration of stability issues of it :))

@oxij
Copy link
Member

oxij commented Oct 9, 2018 via email

@Ekleog
Copy link
Contributor

Ekleog commented Oct 9, 2018

Actually, this resigning could even be done by CI systems.

So the idea of replacing patch-id by commit-id + keep reference to commit & parent sounds good to me, as tooling can make the two just as easy to use (well, so long as you're not emailing patches around, but if you're emailing patches around then patch-id signing wouldn't help you anyway, as it's stored in an obscure blob with an obscure format in an obscure reference of the repository… just sign the mail :))

@lrvick
Copy link
Author

lrvick commented Oct 23, 2018

Now, maintainer's commits are technically signed only by maintainer, but if any problem is found in them, maintainer can be taken as responsible for it until they produce the review proof they have from author and reviewer.

I think this adds a lot of complexity, and patch-id should be very stable now and the output with the flags used in git-signatures should never change.

Unless I am missing something this requires that a maintainer keep all their proofs around by hand and any feature branches be preserved. This makes it very difficult for there to be a first-class tool to show the full paper-trail for a given commit in master at a glance. In practice I feel this means a maintainer will just have exclusive trust for master. Auditing should be -easy- and be able to be done by anyone, not just the maintainer which may be hit by a bus.

Also consider popular (but admittedly shitty) review tools like Phabricator that fully detach a patch and submit it to a database for review. They are then later applied via "arc land" by a maintainer often to a totally different repo for, as stated, $corporatereasons. git-wotr signatures could be attached and re-imported into the new repo.

Also consider all the many open source projects that actually only accept patches by email. A signature on the email can't be easily imported into the upstream repo. A patch attachment with sibling attachment containing a one-line git-wotr signature that contains a patch-id could.

It is also possible I am totally misunderstanding your suggestion here. If so then please provide some specific examples so my mind can grock this :)

@Ekleog
Copy link
Contributor

Ekleog commented Oct 23, 2018

@lrvick So what I'm saying is: A patch-id is a restricted commit-id, with less determinism.

In your two examples, you transfer a patch as well as a patch-id signature. What I'm saying is, you can replace a patch-id signature by some blob a bit bigger (handled by your tool) that's composed of:

  • The commit-id signature on the relevant commit
  • The output of git cat-file -p [hash] and git cat-file -p [hash]^
  • A tarball of the tree in git cat-file -p [hash]^ (optional, see below)
  • The diff between the two trees at [hash]^ and [hash]

With these informations, you actually convey exactly as much information as a patch-id signature, except:

  • It's safe, ie. the user has never signed something they didn't expect to sign (which is the case with mere patch-id signing rebased on something else)
  • It's reproducible, and doesn't depend on details of git's diffing algorithm, which means that review proofs won't be magically invalidated by a git upgrade (which does happen, even if git patch-id rarely changes the patches you'll feed it via stdin generated by git diff-tree will change, and so its output will likely change too)
  • The tarball of the tree in git cat-file -p [hash]^ you would have to convey anyway, given you planned to attach patch-ids with tree-ids anyway, so the tree's tarball must in some way reach the target system (because otherwise a signature really wouldn't mean anything, like imagine a diff to README… it can be applied to literally every project in existence without invalidating the signature)
  • It actually provides strong traceability proofs, given the maintainer can say “yes my rebase was legitimate, and the proof is that before the rebase the code was just as wrong as after”

In exchange, the tarball of the tree before the commit must be exchanged in some way… but anyway this will likely be a commit on master, which would not need any exchange. And even if it was actually required, such an exchange would have been required for patch-id signing anyway, unless willing to take part in the security circus by signing patches just to tick off the case “yeah I sign my code” :)

@lrvick
Copy link
Author

lrvick commented Oct 23, 2018

Your stated strategy aready assumes we need a stable diff command:

The diff between the two trees at [hash]^ and [hash]

The explicit way I am using patch-id and diff-tree (As suggested by @rafasc) can be reasonably expected to never change:

git -c diff.indentHeuristic=false diff-tree -p "$1"..HEAD | git patch-id --stable

This is dramatically simpler than the very large and complex blob, and complex programming logic that would be needed to avoid it.

And yes there are plenty of reasons one might want to ship a valid patch to many different projects. Say for instance re-writing a core crypto function that impacts 100 different bitcoin forks. Maintainers of each fork pull in that upstream patch and consumers have strong assurance the maintainer didn't make it up. Deciding the -order- it is merged is still on the maintainer. They can just prove where the patch originated and perhaps reference any testing/research/review that went with it on top of a similar codebase for performance or whatever else.

As discussed elsewhere: patch-id -is- a lower verification level than commit-id but the most critical use case it supports is allowing multiple in-flight patches to be merged into a tree where the order is determined by the maintainers. The maintainers can pull in patches and responsible for making sure they are merged in a safe order. The authors and reviewers will of course be signing commit-ids as well, and that will allow maintainers to know the context they reviewed under and consider that in their choice of merge order.

Patch-ids are useful for trust between various stages of a release process but only signed object ids on master or tagged releases should generally be trusted by downstream consumers as only that can lock in the order.

My current understanding of your proposal is that it adds significant complexity and throws out the possibility of having multiple PRs all staged at once for merging by maintainers in any order of their choosing. This is impractical for almost every company feature-branch workflow I am aware of.

@Ekleog
Copy link
Contributor

Ekleog commented Oct 23, 2018

Your stated strategy aready assumes we need a stable diff command:

No it doesn't. It just assumes that you can apply an old diff: the diff is here only as an optimization to not store the two full trees. The two full trees, once generated using the stored diff, can then be diff'd with the now-current diff, which is then using a single version and thus not having to care about stability.

Basically I'm not storing the patch-id, but the full diff.

The explicit way I am using patch-id (As suggested by @rafasc) and diff-tree can be reasonably expected to never change

You disabled git/git@33de716 . Right. So you're missing git/git@5580b27 . And likely others I don't know about. And more that will come in the future.

I think when even patch-id's documentation only claims “relatively stable” (which is, assuming identical files as the input, so not even considering git diff volatility), expecting things to never change is a big, big bet on the future. Especially when half the pipeline changes regularly.

And yes there are plenty of reasons one might want to ship a valid patch to many different projects. Say for instance re-writing a core crypto function that impacts 100 different bitcoin forks. Maintainers of each fork pull in that upstream patch and have strong assurance the maintainer didn't make it up. Deciding the -order- it is merged is still on the maintainer. They can just prove where the patch originated and perhaps reference any testing/research/review that went with it on top of a similar codebase for performance or whatever else.

Well, for this use case it's really just a rebased-from-commit [commit-hash] annotation to the signature of the new commit, so it doesn't even require the machinery I was putting forward.

My current understanding of your proposal is that it adds significant complexity and throws out the possibility of having multiple PRs all staged at once for merging by maintainers in any order of their choosing. This is impractical for almost every company I am aware of.

Said complexity is only when you move patches around in a way that is not using git, ie. phabricator or email. When you have direct git access, the tool can just automatically fetch/push the hash referenced by rebased-from-commit [commit-hash], and be done with it.

And even then, it is only complexity incurred by the tooling wishing to support this use case, as the inputs and outputs will be the same: a patch and a signature on said patch. Except the fact that the patch signature based on commit-id will be more robust, at the expense of being more heavy in case the commit on which the patch has been initially reviewed isn't publicly accessible (in which case it'd have to add a tree tarball to the signature to be secure) -- I assume the weight of the diff itself would be negligible in most cases.

However I'm not sure I explained my point clearly: I'm not saying you should drop the idea of signing patches. I guess some specific use case could require it, even though I'm not willing to support these in the tool I'm expecting to write unless I get paid for it.

What I am saying is that you can sign patches by signing their commit-id with a regular commit [hash] diff signature. With the mechanism explained in my previous message as well as the first paragraph of this message.

The advantages are:

  • The signer is safe (never signed ambiguous things)
  • The maintainer is safe (the rebased commit keeps reference of the initial commit -- actually it could even be included as a parent if this is wanted, so that it gets automatically fetched by every downstream). This is actually better proof than just a patch-id for the maintainer, because it also proves that the upstream patch did mean to apply on code similar to the one to which it has been rebased
  • The users can set the verification policy they want (trusting the maintainer, trusting the maintainer iff the commit is a rebase of a commit with a signature by someone trusted, etc.)
  • It doesn't break awfully on the next git update

@lrvick
Copy link
Author

lrvick commented Oct 23, 2018

I guess I am having a hard time following the exact setup you are recommending here, so I'll take a stab at trying to translate into pseudocode to see if I am following.

Lets say I make a PR against the current HEAD of master.

I append a note signature to the HEAD commit of my branch containing something like:

"$commit-id" "$rebase-commit-id" $(sign "$commit-id" "$rebase-commit-id")

Where $rebase-commit-id was the current HEAD this diff was signed against.

From there a several other people make similar PRs in feature branches.

One of these totally unrelated changes merges ahead of mine thus making every other signed PR/diff including mine no longer valid against the current head of master.

You as maintainer in order to verify my commit would do something like:

  • gpg_verify "$commit_id $rebase-commit-id"
  • Verify the that the fact $rebase-commit-id is no longer HEAD of master won't cause any issues
  • append new signature to commit with updated $rebase-commit-id
  • merge branch to master

A downstream consumer of the repo could then potentially verify a ref in master under a policy like:

  • Require at least 1 signature from author tolerating $rebase-commit-id mismatch
  • Require at least 1 signature from reviewer tolerating $rebase-commit-id mismatch
  • Require at least 1 signature from maintainer -not- tolerating $rebase-commit-id mismatch

Does this workflow example sound in line with what you are proposing?

@rafasc
Copy link

rafasc commented Oct 23, 2018

Some of the things I mentioned to @lrvick , I assumed this was being used on a certain context but reading this I clearly misinterpreted that context.

You disabled git/git@33de716 . Right. So you're missing git/git@5580b27 . And likely others I don't know about. And more that will come in the future.

True, patch-id was not designed to be stable in this sense. Its purpose is to just give a 'good enough' confirmation that the patch is already applied upstream. And if it fails, it's usually no big deal, as you expect a human to double check and deal with it.

the rebased commit keeps reference of the initial commit -- actually it could even be included as a parent if this is wanted, so that it gets automatically fetched by every downstream).

https://github.com/mhagger/git-imerge/tree/v1.1.0 Does this and calls it rebase-with-history.

But ...

A rebased branch cannot be trusted just because it was made from trusted commits. Your signed and harmless commit could introduce a problem because it was applied by someone on top of something else harmless by itself.

Similar situation occurs in merges and patch based approach, as you can introduce changes in the merge/apply commit itself.

But at that point, the responsibility falls on the person who did the rebase/merge/apply, not the original author of the commits.

Using your signature over notes extension, you could add your signature back to the upstream commits meaning: "I verify that the maintainer correctly integrated my changes".

And I don't know of any good alternative without compromising something. Be it security, clean history, or ease to use.

@lrvick
Copy link
Author

lrvick commented Oct 23, 2018

I'll conceed on the patch-id point if I can confirm I am understanding the alternative correctly. Clearly they are not as stable as I thought.

A rebased branch cannot be trusted just because it was made from trusted commits. Your signed and harmless commit could introduce a problem because it was applied by someone on top of something else harmless by itself.

Yeah that is exactly the attack we want to be able to deal with. Signing commit-ids only is most secure but we need to have a secondary solution to support typical workflows with multiple in-flight diffs. Specifically we need a signing scheme that allows different parties to certify proof of authorship, proof of review, and merge order separately at different stages of the deployment lifecycle.

Specfically we need to support a flow like:

  • Authors signs changes in respective feature branchs
  • Reviewers sign changes in respective feature branchs
  • Maintainer verifies signatures in feature branch
  • Maintainer merges changes and signs a tag/commit locking in the -order- the changes should be in
  • Consumer can use information attached to given tag/commit to verify
    • All changes at this ref were signed by respective authors
    • Al changes at this ref were reviewed by trusted reviewers
    • The order the changes merged in was signed by a trusted maintainer

Signing both the commit-id and patch-id would allow all of the above to be possible -if- patch-id were stable. I am trying to understand an alternative that still can check all these boxes.

@Ekleog
Copy link
Contributor

Ekleog commented Oct 24, 2018

@rafasc I think we're pretty much in total agreement :)

@lrvick The workflow you exposed unfortunately doesn't work perfectly, because the commit-id would be changed by rebasing, and may turn out not to be pushed to the maintained repository (or to be git gc'd out). Here is what I'm advocating (taking your bullet point list as a basis):

  1. Master branch starts at commit C-orig
  2. Author writes and signs changes:
    1. Author works from master branch, creating a commit C-pr, that has C-orig as parent (to simplify the example let's assume single-commit branches, the scheme would naturally expand).
    2. Author signs C-pr with a regular commit-id signature
  3. Other authors do the same, things are merged, master branch is now at C-new
  4. Reviewer checks and signs changes:
    1. Reviewer checks C-pr
    2. Reviewer signs C-pr with a regular commit-id signature
  5. Maintainer wants to rebase C-pr on top of C-new because they don't like merges:
    1. Maintainer checks commit-id signatures on C-pr
    2. Maintainer rebases C-pr on C-new, this gives C-pr-new
    3. Maintainer signs C-pr-new with a commit-id signature that adds the following metadata fields (be it direct metadata or review message or whatever): C-orig's hash, the output of git cat-file -p C-pr and the signatures by Author and Reviewer
    4. Maintainer pushes C-pr-new as new master
  6. Consumer verifies that the changes were ordered by Maintainer by just checking the commit-id signatures
  7. Consumer verifies that the changes were authored by Author and reviewed by Reviewer by (automated by the tooling):
    1. Checking out C-orig as given in Maintainer's commit-id signature
    2. Patching it with the diff between C-pr-new^ and C-pr-new (making sure the patch applies cleanly)
    3. Committing it with the metadata present in the output of git cat-file -p C-pr present in Maintainer's commit-id signature
    4. Checking that the signatures of Author and Reviewer present in Maintainer's commit-id signature apply correctly on this resulting commit

This way of doing things even allows successive rebases to all keep traceability of who did which rebase and who assumed the other's rebase was correct. And it doesn't rely on stability of the diff, as the diff is just used to be applied on some tree immediately.

It doesn't handle rebase conflicts, because if there is a rebase conflict then it's no longer Author's work and has no longer been reviewed by Reviewer, so these signatures should really be dropped.

@lrvick
Copy link
Author

lrvick commented Oct 24, 2018

Okay. I better understand what you are proposing however it is very complex and is going to make implementation much harder than the patch-id setup.

As far as I can tell there is no change in attack surface between your proposal and using patch-ids. The only reason we are doing all these gymnastics to avoid patch-ids then, is out of potential that that may not be stable.

Coming at this from a different angle, lets say either patch-id is stable, or we dictate our own alternative to patch-id.

For the sake of discussion lets say we put in the spec a specific deterministic diff algorithim then simply sha256 the output. Could call it "diff-id"

We would then keep the signing format dirt simple by always signing the commit-id and this custom but stable diff-id. From there can use all the patch-id workflows as I outlined.

TL;DR: Assume patch-id is stable or we make a drop-in stable alternative. Is this not then a better fit and easier to implement/verify for the outlined workflows?

@Ekleog
Copy link
Contributor

Ekleog commented Oct 24, 2018 via email

@lrvick
Copy link
Author

lrvick commented Oct 25, 2018

Okay. Stewing on this and I am warming up to this idea provided I can find a simple to reason about set of steps to sign/verify by hand. I appreciate you expanding your explanations to address my use cases.

I am currently most unclear on how this step would work.

Committing it with the metadata present in the output of git cat-file -p C-pr present in Maintainer's commit-id signature

If I am understanding this correctly your idea is recreate the PR branch that was signed, and make sure everything checks out in that context and that it gets the same commit hash. I am unclear how to use the cat-file -p output to recreate a commit in a branch and get the same hash.

@Ekleog
Copy link
Contributor

Ekleog commented Oct 25, 2018

Yes, the idea is exactly to recreate the PR branch from the data, using the diff between C-pr-new^ and C-pr-new (to check that the rebase was indeed a no-op) and the metadata from the output of git cat-file -p.

As for how exactly:

$ #
$ # SETTING THE TEST REPOSITORY UP
$ #
$ git init test
Initialized empty Git repository in /tmp/test/.git/
$ cd test/
$ touch test
$ for i in `seq 1 100`; do echo aaaa >> test; done
$ git add test
$ git commit -m "."
[master (root-commit) 40eb09d] .
 1 file changed, 100 insertions(+)
 create mode 100644 test
$ git checkout -b test-branch
Switched to a new branch 'test-branch'
$ sed -i '3s/aaaa/bbbb/' test
$ git commit -am "change aaaa to bbbb line 3"
[test-branch b2a250f] change aaaa to bbbb line 3
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout master
Switched to branch 'master'
$ sed -i '50s/aaaa/cccc/' test
$ git commit -am "change aaaa to cccc line 50"
[master cd938ad] change aaaa to cccc line 50
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout test-branch
Switched to branch 'test-branch'
$ git cat-file -p HEAD > metadata
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: change aaaa to bbbb line 3
$ cd ..
$ git clone test test2
Cloning into 'test2'...
done.
$ cd test2
$ #
$ # VERIFICATION
$ #
$ # here test2 never saw the initial test-branch
$ # now we're entering the step of verifying the validity of the commit
Branch 'master' set up to track remote branch 'master' from 'origin'.
Switched to a new branch 'master'
$ cat ../test/metadata  | grep parent
parent 40eb09de44dc69e839d3b6ad0f922ae1e8ebff6a
$ git checkout 40eb09de44dc69e839d3b6ad0f922ae1e8ebff6a
Note: checking out '40eb09de44dc69e839d3b6ad0f922ae1e8ebff6a'.
[ ... skipped for brevity ... ]
$ git apply <(git diff master test-branch)
$ git add .
$ git write-tree
f565b40932ab5954dd4b43d47f026bd4c9670b9e
$ cat ../test/metadata | grep tree
tree f565b40932ab5954dd4b43d47f026bd4c9670b9e
$ # all is going good up to now (the check is important security-wise)
$ git hash-object -t commit ../test/metadatab2a250fe7648ce5dea07d50c14083f9ba8ab0353
b2a250fe7648ce5dea07d50c14083f9ba8ab0353
$ # it's the same hash than in the original branch, so we can check signatures on it

Maybe there's a way to do it without messing with the staging area, by using git worktrees or similar.

Now, disclaimer: I have written this in the github comment interface, meaning that I haven't security-checked it. Intuitively it should be OK, but there may be something I missed. No warranties, yadda yadda.

@lrvick
Copy link
Author

lrvick commented Oct 25, 2018 via email

@oxij oxij mentioned this issue Dec 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants