Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR 44: describing support of SDPX SBOM format in konflux #213

Merged
merged 16 commits into from
Dec 2, 2024
Merged
285 changes: 285 additions & 0 deletions ADR/0044-spdx-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
# 44. SPDX SBOM support

Date: 2024-09-24

## Status

Proposed
midnightercz marked this conversation as resolved.
Show resolved Hide resolved

## Glossary
* SBOM - Software Bill of Materials
* SPDX - Software Package Data Exchange
* PURL - Package URL
* Builder images - Images used in FROM instructions in the `Dockerfile`
* Root package - The package representing the source of the SBOM itself

## Context

SPDX SBOM format enables additional features not available in cyclondedx like multiple purl attributes per component. SPDX is also a widely adopted standard for software bill of materials.
This ADR describes how to enable use of SPDX SBOM format in Konflux.

## Decision

### SBOM lifecycle in build pipeline

At the start SBOMs are generated by [cachi2](#references) and [syft](#references). These two SBOM files are merged together into single SBOM document. At later phase of the build pipeline, builder images of the currently build image are added into SBOM as build dependency of the image. To switch to SPDX format, all tools producing and processing SBOMS in the pipeline has to be able to work with SPDX format. SBOMS of builder images are not processed by the pipeline, therefore builder images SBOMs doesn't have to be in SPDX format. This leads to fact that when tools generating the SBOMs are switched to SPDX format, all tools processing SBOMS can expect SPDX format only. There's no need for any tool to be able to work with mixed inputs of SPDX and CycloneDX formats
As a result, tekton tasks should implement the sbomType attribute to specify the expected SBOM format for input and output. This will allow tools to be tested with SPDX before the entire pipeline transitions to this format.

### CycloneDX -> SPDX conversion

CycloneDX (1.5) is structured document in json format with following structure (not full specification)

- Document
- Metadata
- Tools
- `List<Tool>`
- vendor
- name
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
- `<Component>` (attributes same as bellow)
- Components
- `List<Component>`
- name
- version
- purl
- properties
- `List<Property>`
- name
- value
- formulations
- `List<Formulation>`

SPDX (2.3) is structured document in json format with following structure(not full specification):
- Document
- name
- SPDXID
- creationInfo
- Creators
- `List<String>`
- packages
- `List<Packages>`
- SPDXID
- name
- versionInfo
- externalRefs
- `List<ExternalRef>`
- referenceCategory
- referenceType
- referenceLocator
- annotations
- `List<Annotation>`
- annotationDate
- annotationType
- annotator
- Comment
- relationships
- `List<Relationship>`
- spdxElementId
- relationshipType
- relatedSpdxElement

#### 1:1 conversions
Following CycloneDX to SPDX attributes are converted as 1:1 as they represent the same thing.

| CycloneDX Attribute | SPDX Attribute |
|----------------------------|---------------------|
| components | packages |
| component.name | package.name |
| component.version | package.versionInfo |


#### Component.purl
CycloneDX (version 1.5) supports only a single purl attribute per component. SPDX doesn’t have a direct attribute, but instead every package includes an externalRefs array which describes all external references for the package. There are defined reference categories and types. For PURL, category PACKAGE-MANAGER and type purl is used. The purl itself will be stored as referenceLocator
```
| CycloneDX Attribute | SPDX Attribute |
|------------------------------|---------------------------------------------------------------|
| component.purl = `<PURL>` | package.externalRefs = [{referenceCategory:”PACKAGE-MANAGER”, |
| | referenceType:purl, |
| | referenceLocator: `<PURL>` |
| | }] |
Comment on lines +96 to +101
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the formatting in this table and the tables below? It's not readable in the rendered doc https://github.com/midnightercz/konflux-ci-architecture/blob/spdx-support/ADR/0044-spdx-support.md#componentpurl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was fixed. All tables look correct to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loss of indentation hurts readability, especially for this one

image

How about changing them from Markdown tables to monospace blocks? Or just wrapping the tables in ``` so that they look the same as in the source code

```

#### Component.properties
CycloneDX components properties describe mapping of string:string properties for given component. SPDX component doesn’t have anything similar to cyclonedx properties. SPDX Package annotations are the only attribute where custom data can be stored and the only “customizable” field where there is comment which is a simple string. Due to that fact, cycloneDX property in format of {“name”: <string>, “value”: <string>} is encoded into json string. There can be also annotations produced by other tools. Therefore to be able to tell annotation comment is json encoded, annotator should ends with string “:jsonencoded”

```
| CycloneDX Attribute | SPDX Attribute |
|-------------------------------------------|---------------------------------------------|
| components.properties = [ | package.annotations = [ |
| {“name”: …, “value”: …} | {..., annotator: "`<tool>`:jsonencoded” |
| ] | ] |
```

#### Formulations
CycloneDX formulations describe how the container was manufactured. In SPDX, Relationship elements can be used for the same purpose. All elements in SPDX have SPDXID attribute which is an element identifier unique in the whole SBOM document. Relationship element describes relation between two elements using their SPDXID and relationship type. Relationship type BUILD_TOOL_OF can be used to express the relationship of packages which were used to build the container.

```
| CycloneDX Attribute | SPDX Attribute |
|---------------------------------|------------------------------------------------------------|
| Formulations.components = [{}] | Relationships = [ |
| | { |
| | spdxElementId = `<DOCUMENT-ID>`, |
| | relationshipType=DESCRIBES, |
| | relatedSpdxElement=`<ROOT-PACKAGE>`, |
| | }, |
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
| | { |
| | spdxElementId = `<A-BUILDER-IMAGE-ID>`, |
| | relationshipType=BUILD_TOOL_OF, |
| | relatedSpdxElement=`<ROOT-PACKAGE>` |
| | } |
| | ] |
```

*Explanation: Root document `DESCRIBES` `ROOT-PACKAGE` element which represents the container itself. `BUILDER-IMAGE-ID` represents the builder image which was used to build the container. The relationship type `BUILD_TOOL_OF` is used to express that the builder image was used to build the container image.*

#### Metadata.tools
The CycloneDX metadata.tools sub attributes that we are mostly interested in are the vendor and name elements. Information about the creation of the SPDX document can be stored into creationInfo. CreationInfo.creators element is basically a list of strings. There’s a vague specification ([here](https://spdx.github.io/spdx-spec/v2.3/document-creation-information/#68-creator-field)]) about how it should be structured in the standard. Strings should be formatted in the following way: `<Attribute>: <Value>`. For example vendor should be stored as `Vendor: <vendor>`. Redommendation is to use only Tool as vendor can be misinterpreted as vendor of the SBOM not the tool which created it.

| CyloneDX Attribute | SPDX Attribute |
|------------------------------------------------|---------------------------------------------------|
| Metadata.tools = [{“vendor”: “X”, “name”: “Y”] | CreationInfo.creators = [“Tool: Y”] |

#### Metadata.component
Metadata component describes component which is the component which whole SBOM is related to. For example If SBOM describes internal components and dependencies of a container image, this component should represent the container image itself. In SPDX, a package which is equivalent to this component is root package (see [root-packages](#root-packages) to have idea how syft handles this package). This package is in relationship `SPDXRef-ROOT` `DESCRIBES` `SPDXRef-RootPackage`.

Example:
We run syft/cachi2 on source directory of a project which should be build into container. Generated SBOM contains
```jsonc
{
"SPDXID": "SPDXRef-DOCUMENT",
// ...
"packages": [
{
"name": ".",
"SPDXID": "SPDXRef-DocumentRoot-Directory-.",
"supplier": "NOASSERTION",
"downloadLocation": "NOASSERTION",
"filesAnalyzed": false,
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "NOASSERTION",
"primaryPackagePurpose": "FILE"
},
{
{
"name": "attrs",
"SPDXID": "SPDXRef-Package-python-attrs-eef51168ca2a575f",
"versionInfo": "24.2.0",
...
}
...
],
"relationships": [
{
"spdxElementId": "SPDXRef-DOCUMENT",
"relatedSpdxElement": "SPDXRef-DocumentRoot-Directory-.",
"relationshipType": "DESCRIBES"
},
{
"spdxElementId": "SPDXRef-DocumentRoot-Directory-.",
"relatedSpdxElement": "SPDXRef-Package-python-attrs-eef51168ca2a575f",
"relationshipType": "CONTAINS"
}
...
]
}
```

And we want that to express that SBOM is actually generated for a container image not for source directory.
So we remove `SPDXRef-DocumentRoot-Directory-.` package and add new virtual package representing the container image. And replace SPDX ID in relationships with ID of the new package. New SBOM should look like this:

```jsonc
{
"SPDXID": "SPDXRef-DOCUMENT",
...
"packages": [
{
"name": "my-image",
"SPDXID": "SPDXRef-image",
"versionInfo": "latest",
...
"checksums": [
{
"algorithm": "SHA-256",
"checksumValue": "9ac75c1a392429b4a087971cdf9190ec42a854a169b6835bc9e25eecaf851258"
}
],
...
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceType": "purl",
"referenceLocator": "pkg:oci/my-image@sha256:9ac75c1a392429b4a087971cdf9190ec42a854a169b6835bc9e25eecaf851258?repository_url=container-registry.com/my-org/my-image"
}
],
"primaryPackagePurpose": "CONTAINER"
},
{
{
"name": "attrs",
"SPDXID": "SPDXRef-Package-python-attrs-eef51168ca2a575f",
"versionInfo": "24.2.0",
...
}
...
],
"relationships": [
{
"spdxElementId": "SPDXRef-DOCUMENT",
"relatedSpdxElement": "SPDXRef-DocumentRoot-Image-container-registry.com/my-org/my-image:latest",
"relationshipType": "DESCRIBES"
},
{
"spdxElementId": "SPDXRef-DocumentRoot-Image-container-registry.com/my-org/my-image:latest",
"relatedSpdxElement": "SPDXRef-Package-python-attrs-eef51168ca2a575f",
"relationshipType": "CONTAINS"
}
...
]
}
```


#### Merging SPDX
##### Packages
Packages of two SPDX documents can be merged together as a concatenation of two lists. In cycloneDX component elements can have only a single purl attribute, therefore component elements representing packages with the same name and version but with different purl have to be stored as multiple elements. SPDX package elements can bear multiple purls. Therefore multiple cycloneDX components can be squashed together into single SPDX package element with purls concatenated into a single list. Following rules are applied to generic packages merging process:
- Packages with the same purl's package name and version and type are squashed into single package element
Copy link
Contributor

@chmeliik chmeliik Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar concern to #213 (comment) - this shouldn't be the authoritative decision on how to merge packages. Not all tools should do it this way, cachi2 certainly shouldn't

Copy link
Contributor

@chmeliik chmeliik Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can just re-word this to make it clear that this is a sort of default suggestion for how to do the merging if you don't have more specific requirements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point


*NOTE: packages cannot be merged together based on SPDXID attribute as there’s no specification in the SPDX standard on how SPDXID should be calculated. Individual tools can calculate it differently while still passing condition to make it unique across the whole document.*

##### Relationships
SPDX relationships represent graph/tree structure of relations of elements in the document. The Root element is the SPDX document itself (with SPDXID SPDXRef-Document).
SPDX Root document typically contains a package representing source used for generating the SBOM. This can be container image, directory, etc. The document is in relationship DESCRIBES with this source package - called root package in this document. Other packages are in specific relationships with the root package. See also [syft specific sbom details](#syft-specific-sbom-details)

Relations of two documents needs to be merged together into single graph in a way which keeps the graph structure of the original graph of the main document (into which other document will be merged to). Once packages are merged together, relationships of the second document must be cleared off relations which refer to packages not included in the merged package list. SpdxElementId and relatedSpdxElement point to root document id of the second document should be replaced with root document id of the main document. Root package element id in the second documents needs to be replaced with root package element id of the main document.
Example:

```
--------------------------------------------------------------------------------
| DOC 1 | DOC 2 | Merged document |
----------------------------|--------------------------|------------------------|
| Doc1 | Doc2 | Doc1 |
| Packages: | Packages: | Packages: |
| root1 | root2 | root1 |
| p1 | p2 | p1 |
| p2 | p3 | p2 |
| Relationships: | Relationships: | p3 |
| Doc1 describes root1 | Doc2 describes root2 | Relationships: |
| root1 contains p1 | root2 contains p2 | Doc1 desctibes root1 |
| root1 contains p2 | root2 contains p3 | root1 contains p1 |
| | | root1 contains p2 |
| | | root1 contains p3 |
```

## Syft specific sbom details
### Root packages
Syft generates "source package" representing source used to generate the sbom document. For example when sbom is generated by command `syft scan dir:<dir>`, the package
with SPDXID `SPDXRef-DocumentRoot-Directory-<dir>-` is generated. Such package has name se to `<dir>`, no versionInfo and no attributes. In relationships this package is in then in relation `SPDXRef-DOCUMENT` `DESCRIBES` `SPDXRef-DocumentRoot-Directory-<dir>-` and then all packages are in relation CONTAINS with this virtual package, e.i. `openshift4----ose-cluster-update-keys` `<RELATIONSHIP-TYPE>` `Package-A`.

## Consequences
All tooling used in pipeline needs to support SPDX SBOM format

## References
* [CycloneDX specification](https://cyclonedx.org/specification/overview/)
* [SPDX specification](https://spdx.github.io/spdx-spec/v2.3/)
* [SPDX json schema](https://github.com/spdx/spdx-spec/blob/development/v2.3/schemas/spdx-schema.json)
* [cachi2](https://github.com/containerbuildsystem/cachi2/)
* [syft](https://github.com/anchore/syft)