Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework pkg/tlog/entry.go to not depend on sigstore/rekor/pkg/types #28

Closed
wants to merge 1 commit into from

Conversation

steiza
Copy link
Member

@steiza steiza commented Nov 9, 2023

Summary

See #23

This lets us reduce the size of the sigstore-go binary by over 4 MB, since sigstore/rekor/pkg/types has many dependencies.

Release Note

NONE

Documentation

N/A

@steiza steiza requested a review from a team November 9, 2023 16:54
See #23

This lets us reduce the size of the sigstore-go binary by over 4 MB,
since sigstore/rekor/pkg/types has many dependencies.

Signed-off-by: Zach Steindler <[email protected]>
@haydentherapper
Copy link
Contributor

Long-term, we'd likely want all rekor types supported in sigstore-go. Do you think it'd be better to figure out how to minimize dependencies in rekor/pkg/types?

Copy link
Contributor

@haydentherapper haydentherapper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not have duplicated logic here for parsing, and this would skip correctness checks that Rekor jas implemented. I would rather we refactor Rekor to trim down dependencies for each rekor type, or at least these critical ones.

@@ -66,21 +67,24 @@ func NewEntry(body []byte, integratedTime int64, logIndex int64, logID []byte, s
if err != nil {
return nil, err
}
rekorEntry, err := types.UnmarshalEntry(pe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside to removing this is there's certain correctness checks that occur during unmarshalling that will now be skipped.

logEntryAnon: models.LogEntryAnon{
Body: base64.StdEncoding.EncodeToString(body),
IntegratedTime: swag.Int64(integratedTime),
LogIndex: swag.Int64(logIndex),
LogID: swag.String(string(logID)),
},
kind: pe.Kind(),
version: rekorEntry.APIVersion(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is kind needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We retain the pe object on rekorEntry, so we just call rekorEntry.Kind() when we need it (see below).

@steiza
Copy link
Member Author

steiza commented Nov 9, 2023

I would rather we refactor Rekor to trim down dependencies for each rekor type, or at least these critical ones.

I'm not as familiar with the Rekor codebase, but this could be investigated further. It's possible this could be as simple as removing go-openapi/strfmt from rekor/pkg/types, but I haven't confirmed that.

@haydentherapper
Copy link
Contributor

It might be due to some of the Rekor packages like https://github.com/sigstore/rekor/blob/main/pkg/types/hashedrekord/v0.0.1/entry.go#L36-L38 that should be removed. I'm not sure if there's tooling for this, but it'd be nice to see what packages are causing the large size.

@steiza
Copy link
Member Author

steiza commented Nov 9, 2023

I took a quick look at Rekor, but wasn't able to come up with a simple proposal to reduce dependencies. I think a deeper looks is probably warranted before giving up completely on a Rekor refactor.

I hesitate to call it "tooling", but here's the approach I'm taking:

  1. During the build, ask the linker to output the call graph:
go build -ldflags="-c" -o sigstore-go cmd/sigstore-go/main.go 2> callgraph.txt
  1. After the build, use nm to output the symbol table, with sizes:
go tool nm -size sigstore-go > nm.txt
  1. Use a very fragile Python script to estimate the "cost" of a function by adding up the size of the symbols it calls
from collections import defaultdict
import sys
import re


def main(nm, callgraph):
    account = defaultdict(int)
    symbol_size = defaultdict(int)

    with open(nm, "r") as fd:
        while True:
            line = fd.readline()

            if not line:
                break

            if line.startswith(" "):
                continue

            _, size, _, symbol = re.split(r'\s+', line)[0:4]

            symbol_size[symbol] = int(size)

    with open(callgraph, "r") as fd:
        while True:
            line = fd.readline()

            if not line:
                break

            if line.startswith("#") or len(line.split(" ")) < 3:
                continue

            caller, _, callee = line.split(" ")[0:3]

            account[caller] += symbol_size[callee.strip()]

    items = list(account.items())
    items.sort(key=lambda each: each[1], reverse=True)
    for each in items:
        print(each[0], each[1])

    print(sum(account.values()))


if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: %s nm.txt callgraph.txt".format(sys.argv[0]))
    else:
        main(sys.argv[1], sys.argv[2])
  1. Look at the list for things that look like they don't belong (like lots of mongo-driver/bson)

  2. Look at the call graph to figure out who is referencing those objects

  3. Comment out suspected code, recompile, and see if the binary is any smaller!

@haydentherapper
Copy link
Contributor

Another idea for an iterative approach could be for each package we import, look at its dependencies and order by most minimal usage. Copy-paste the functions used and see if the size decreases significantly. If so, then we'll note that dependency needs refactoring if it's under our control, or if it's a third-party dependency, we could just copy in the function.

@haydentherapper
Copy link
Contributor

haydentherapper commented Nov 9, 2023

I think this also warrants a larger conversation about what libraries are authoritative for verification logic throughout the Sigstore stack. I've thought about sigstore-go as glue for services. Ideally, I would like the following (and this is just my thoughts! I'll file an issue to discuss more):

  • Rekor libraries
    • Type construction and parsing
    • Rekor client
    • Inclusion/consistency proof verification (if we need to simplify the existing trillian/transparency-dev APIs, otherwise use those directly)
  • Fulcio libraries
    • Short-lived certificate verification (FWIW, this is fairly lightweight which is why we see it duplicated in each implementation currently)
    • Fulcio gRPC/HTTP clients
  • sigstore-go
    • Bundle consumption & generation
    • Golang logic for "code signing and transparency"
      • Glue between Signature/Attestation, Certificate/Key, Timestamp, and Inclusion Proof
      • Certificate could be Fulcio, could be self-managed PKI, etc.
      • Designed such that we can support both the expected Sigstore path and other signing/verification flows
  • sigstore/sigstore
    • Shared between Sigstore services and Golang clients (sigstore-go, gitsign)
    • Maybe should get folded into sigstore-go, but circular dependencies can occur

I'd like for maintainers to align on this, as it'll drive what the dependency tree should look like and how we can simplify it.

Edit: Also previous discussion around this: sigstore/sigstore#678

@kommendorkapten
Copy link
Member

I would love to see if we can slim down Rekor, as that would benefit multiple clients, not just sigstore-go package.

@steiza
Copy link
Member Author

steiza commented Nov 15, 2023

I 100% agree that the Rekor libraries should own type construction and parsing, and that ideally sigstore-go would depend on those libraries.

... but I've spent some time over the past few days trying to see if there was an easy way to reduce the size of rekor/pkg/types (and in particular rekor/pkg/pki / rekor/pkg/pki/x509) and wasn't able to come up with anything. If anyone is able to reduce the footprint of those packages, we would definitely make use of them in sigstore-go!

But as it stands, we have a bit of a trade-off. On one hand, we have the more thorough correctness checks in rekor/pkg/types, and on the other we have the substantial reduction of ~4 MB out of ~25 MB total. I still think that reduction in size is worth it.

@haydentherapper
Copy link
Contributor

Personally, I would prefer larger size over duplication. We impact readability and testability with duplication. When I had suggested minimizing dependencies, I was thinking more about minimizing dependencies focusing on unmaintained or minimally maintained dependencies.

I’m also surprised those packages are so large, is this the bson dependency again or something else?

@steiza
Copy link
Member Author

steiza commented Nov 16, 2023

When I had suggested minimizing dependencies, I was thinking more about minimizing dependencies focusing on unmaintained or minimally maintained dependencies.

Yeah, the problem is the dependency has to be at the right level - something we depend on directly that we can change our usage of. If we depend on A, which depends on B, which has a bunch of dependencies in C that are big, it isn't straightforward to untangle that (let alone upstream a change to B).

I’m also surprised those packages are so large, is this the bson dependency again or something else?

Unfortunately for the time being I've run out of time to do additional research in Rekor's codebase. Like I mentioned, I was not able to identify something that we could easily swap out for a similar decrease in size. It's possible it exists, and I would love to see someone post a PR with that alternative.

Personally, I would prefer larger size over duplication.

I think it depends on the size! @kommendorkapten came up with a simplified version of the problem statement:

sigstore/sigstore-go$ cat main.go 
package main

import (
	"fmt"
	dsse "github.com/sigstore/rekor/pkg/types/dsse/v0.0.1"
)

func main() {
	var aa dsse.V001Entry
	fmt.Println(aa)
}
sigstore/sigstore-go$ go build main.go 
sigstore/sigstore-go$ ls -lh main
-rwxr-xr-x  1 steiza  steiza    21M Nov 16 14:01 main

21 MB!! Compared to just 11 MB using the generated model:

sigstore/sigstore-go$ cat main.go 
package main

import (
	"fmt"
	"github.com/sigstore/rekor/pkg/generated/models"
)

func main() {
	var aa models.DSSEV001Schema
	fmt.Println(aa)
}
sigstore/sigstore-go$ go build main.go 
sigstore/sigstore-go$ ls -lh main
-rwxr-xr-x  1 steiza  steiza    11M Nov 16 14:02 main

@steiza
Copy link
Member Author

steiza commented Nov 30, 2023

Okay, so it sounds like the next step is to try to upstream slimming into Rekor. I don't have any near-term plans to do so, but whoever picks that up should feel free to refer back to this PR.

@steiza steiza closed this Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants