Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing VEX files in a dedicated directory within Git repositories #46

Open
knqyf263 opened this issue May 1, 2024 · 8 comments
Open

Comments

@knqyf263
Copy link
Contributor

knqyf263 commented May 1, 2024

Description

I would like to open a discussion regarding the file path convention for storing OpenVEX files within a Git repository. In the example of Cilium, the filename .openvex.json is used. However, considering factors such as future OpenVEX version upgrades, the need to retain older files, storing individual VEX files for the OCI artifact and the project, and accommodating multiple VEX formats like OpenVEX and CSAF, I think it would be better to store VEX files under a dedicated directory like .vex/ rather than using a single file.

Example

For example, a filename format would be like NAME.FORMAT.json for storing the VEX files. With this approach, the file path would look like this:

  • .vex/cilium-oci.openvex.json
  • .vex/cilium-golang.openvex.json
  • .vex/cilium.csaf.json

When storing VEX files in a Git repository, there is a challenge in associating package names with repository names for most ecosystems other than Go. However, users can still utilize the VEX files by manually downloading them, and defining a standard location for these files is beneficial.

I welcome any feedback or thoughts on this proposal.

@luhring
Copy link
Contributor

luhring commented May 7, 2024

Makes sense to me!

@ritazh
Copy link

ritazh commented Jul 9, 2024

how do we differentiate the different versions, architecture for a component? for container images, may need the image digest.

@knqyf263
Copy link
Contributor Author

knqyf263 commented Jul 9, 2024

In my opinion, if the type, namespace and name components in PURL are equal, they should be stored in the same VEX file as OpenVEX supports multiple products.

The definition for each components is:

  • scheme: this is the URL scheme with the constant value of "pkg". One of the primary reason for this single scheme is to facilitate the future official registration of the "pkg" scheme for package URLs. Required.
  • type: the package "type" or package "protocol" such as maven, npm, nuget, gem, pypi, etc. Required.
  • namespace: some name prefix such as a Maven groupid, a Docker image owner, a GitHub user or organization. Optional and type-specific.
  • name: the name of the package. Required.
  • version: the version of the package. Optional.
  • qualifiers: extra qualifying data for a package such as an OS, architecture, a distro, etc. Optional and type-specific.
  • subpath: extra subpath within a package, relative to the package root. Optional.

https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#package-url-specification-v10x

Different version, qualifiers and subpath will be in the same VEX document.

Example:

  "statements": [
    {
      "vulnerability": {
        "name": "CVE-2023-12345"
      },
      "products": [
        {"@id": "pkg:apk/wolfi/[email protected]?arch=armv7"},
        {"@id": "pkg:apk/wolfi/[email protected]?arch=x86_64"}
      ],
      "status": "fixed"
    }
  ]
  • golang.openvex.json
  • oci.openvex.json
    • pkg:oci/repo@sha256:xxx?repository_url=docker.io/org/repo&arch=amd64
    • pkg:oci/repo@sha256:yyy?repository_url=docker.io/org/repo&arch=arm64
    • pkg:oci/repo@sha256:yyy?repository_url=ghcr.io/org/repo&arch=amd64

I'm also open to the idea of combining all relevant products into one VEX file per format (e.g., openvex.json and csaf.json).

@bureado
Copy link

bureado commented Jul 9, 2024

I'm also open to the idea of combining all relevant products into one VEX file per format (e.g., openvex.json and csaf.json)

What are the drawbacks/disadvantages of having a single .vex/openvex.json per repository?

@knqyf263
Copy link
Contributor Author

knqyf263 commented Jul 10, 2024

What are the drawbacks/disadvantages of having a single .vex/openvex.json per repository?

I plan on somehow grabbing this VEX file when scanning for vulnerabilities, and having it all in one file means there will be some inefficiencies in file size and processing speed since it contains a lot of unnecessary statements (irrelevant products).

For example, statements about OCI images are unnecessary when scanning Go binaries. If they are grouped together in a certain unit, it is efficient to obtain only the necessary VEX documents, but if we separate files by version or arch, the number of files will become too large, so I proposed grouping by type, namespace, and name. However, I don't think it's a big deal even if all the products are in one file, as it might just make things a little less efficient (or harder for humans to view).

@bureado
Copy link

bureado commented Jul 10, 2024

@knqyf263 I see, thanks. I think this could be resolved with clarity in the client contract. Let me describe how I understand the client will find the VEX file in the location you're proposing, and how it'll use it. This is my understanding from your proposal, so I could be wrong:

  1. The client is scanning some runtime artifact for vulnerabilities, it could be an entire VM, a Kubernetes deployment, a sole container image, etc.
  2. The scanner finds n=1000s of software components in there, all represented in purl strings.
  3. The scanner does whatever it does to determine CVEs across all of those purls
  4. As part of its suppression logic, the scanner has to see if a given purl has any VEX documents that can counter a CVE
  5. Since no VEX (or no VEX matching that purl) was provided by the user, it then enters "discovery" mode
  6. The problem at hand is how do we go from pkg:oci/repo@sha256:bababa to https[:]//github.com/awesomeproject/greatrepo, from where we simply pivot to the default branch, and look into .vex/openvex.json, for example.
  7. The problem above is easier for Golang libraries than for almost anything else, but this proposal doesn't aim to solve that problem. It assumes the scanner or the client know which git remote and default branch name to use before pivoting to .vex/openvex.json.
  8. .vex/openvex.json is always kept up-to-date at the tip of the default branch, and can reference the purls of any artifact that is created from this source repository. That means the file will have all product types (debs, rpms, container images, golang libs, etc.), and all versions.
  9. The scanner will fetch the OpenVEX file, but it will only suppress the findings that correspond to the right purl match. So if a repository produces a Golang library and a deb package, and the scanner only found the deb package, it'll suppress those findings only.

Having a single OpenVEX file per repository is a way for the owners of orgfoo/projectbar to say: "this is the list of orgfoo counter claims to vulnerabilities in projectbar" and it's a one-stop shop for clients. Clients still have to:

  1. Find a way to go from a purl to a git remote and default branch name tuple
  2. Responsible for matching their purl with one in the products array before suppressing a finding

Note: since (1) is an open, hard problem (I wonder if reverse lookups on things like SECURITY-INSIGHTS could be interesting, CC @luigigubello), we could say the problem of a single file vs. many files (one per package type, package name, version, architecture) isn't that big of a problem (this is similar to how APT and RPM can fetch the right manifest by convention once you can give them a base URL to a repository) The difference is that the purl that the client would need to have is much more precise. And my concern is that it raises the bar both on the VEX producer as well as the consumer. For example, what happens if the producer or the consumer are assessing version ranges?

I apologize for the long comment, and I apologize if I misinterpreted or misrepresented anything. I appreciate the opportunity to think deeply about this problem. I still think there are many merits to having an agreeable convention for repository owners to keep their VEX counterclaims under source control, happily solving part of the distribution problem too, and to that extent I would suggest lowering the bar making it simpler for producers and consumers to discover, acquire and process a single file for all artifacts produced from a given repository.

@knqyf263
Copy link
Contributor Author

Yes, you're correct!

  1. Find a way to go from a purl to a git remote and default branch name tuple

True. I have an idea for that, but it's out of scope in this request. Let's assume that clients find the source repository in some way.

  1. Responsible for matching their purl with one in the products array before suppressing a finding

That's also true. Clients have to implement the same logic as go-vex.

For example, what happens if the producer or the consumer are assessing version ranges?

There is no way to specify version ranges now. #26

I would suggest lowering the bar making it simpler for producers and consumers to discover, acquire and process a single file for all artifacts produced from a given repository.

To start with a simple specification, a single file, like vex.json, works for me. For future extensibility, I still think .vex/ is better defined, so people may end up using .vex/vex.json for most use cases.

BTW, we already have two files under .vex. This is because these files are produced differently. One is generated with govulncheck. But we can still merge them into a single file if the spec enforces it.

@knqyf263
Copy link
Contributor Author

knqyf263 commented Aug 1, 2024

  1. Find a way to go from a purl to a git remote and default branch name tuple

FYI: This is what we did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants