Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SPDX format support in sbom scripts [CLOUDDST-24250] #163

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

midnightercz
Copy link

@midnightercz midnightercz commented Sep 17, 2024

Added support of SPDX format for helper scripts manipulating SBOMs [CLOUDDST-24250]
Related to ADR 0039-spdx-support

Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Could you give the PR a more thorough description and link the relevant issues, docs etc.? There's a lot of guesswork for us here.

(Reviewed only base_images_sbom_script.py so far)

@mkosiarc
Copy link
Contributor

mkosiarc commented Oct 8, 2024

If the newly added scripts are to be used in the buildah task in konflux, they need to be added to the Dockerfile as well - https://github.com/konflux-ci/build-tasks-dockerfiles/blob/main/sbom-utility-scripts/Dockerfile

This means it will be part of the image that is used in the individual steps that are working with sbom in the buildah task, e.g. https://github.com/konflux-ci/build-tasks-dockerfiles/blob/main/sbom-utility-scripts/Dockerfile

@mkosiarc
Copy link
Contributor

mkosiarc commented Oct 8, 2024

Is this work tracked internally somewhere?

@mkosiarc
Copy link
Contributor

mkosiarc commented Oct 8, 2024

Seems that the CI did not run automatically because you are not a maintainer. I had to approve it and it seems to run now. Sorry about that.

The new testcase is failing:

        else:
>           root_element1, map1, inverse_map1 = map_relationships(sbom['relationships'])
E           KeyError: 'relationships'
base_images_sbom_script.py:165: KeyError
=========================== short test summary info ============================
FAILED test_base_images_sbom_script.py::test_main_input_sbom_spdx_minimal - K...

You can also test this locally with tox -e test

@midnightercz
Copy link
Author

Is this work tracked internally somewhere?

It's tracked in CLOUDDST-24250, I created the issue after I started working on this, so that's why it's not in the commit mesage

if contains and inverse_map1.get(r) == root_element1:
middle_element1 = r
# if not middle_element1:
# middle_element1 = root_element1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably left by mistake?

@mkosiarc
Copy link
Contributor

@chmeliik does @midnightercz need to be added to the rhtap-build namespace, so that the pipeline can be triggered from this PR?

@chmeliik
Copy link
Contributor

chmeliik commented Oct 10, 2024

just needs /ok-to-test no?

Huh, maybe not 😕

@chmeliik
Copy link
Contributor

/retest

@chmeliik
Copy link
Contributor

/ok-to-test

@mkosiarc
Copy link
Contributor

Is this work tracked internally somewhere?

It's tracked in CLOUDDST-24250, I created the issue after I started working on this, so that's why it's not in the commit mesage

You could always edit the commit message and make life easier for everybody in the future, but now it is at least mentioned in the pull request.

@mkosiarc
Copy link
Contributor

/ok-to-test

@mmorhun
Copy link
Collaborator

mmorhun commented Oct 11, 2024

/ok-to-test

@midnightercz midnightercz changed the title Added SPDX format support in sbom scripts Added SPDX format support in sbom scripts [CLOUDDST-24250] Oct 15, 2024
Copy link
Contributor

@tkdchen tkdchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe the problem, which this PR addresses, in detail in the commit message of 1b01170. The information may be about what is missing by current implementation, what's the purpose of adding SPDX format support, etc.

@midnightercz
Copy link
Author

Please describe the problem, which this PR addresses, in detail in the commit message of 1b01170. The information may be about what is missing by current implementation, what's the purpose of adding SPDX format support, etc.

I added information about ADR describing the problem

@zregvart
Copy link
Member

What can be done to get this merged? The EC team is awaiting this work to be done before proceeding with the full SPDX support.

@chmeliik
Copy link
Contributor

What can be done to get this merged? The EC team is awaiting this work to be done before proceeding with the full SPDX support.

Currently the SPDX feature is in design phase konflux-ci/architecture#213

@midnightercz
Copy link
Author

@chmeliik @zregvart I added changes from ADR in last commit + some explanatory comments. I also added test to compare merged cyclonedx and spdx to ensure they are equal (in sense of packages and purls).
Also, question: Is merge_syft_sboms_spdx.py used anywhere? It looks like it's not used in build-definitions repo. Is it possible it's deprecated script?

@zregvart
Copy link
Member

Also, question: Is merge_syft_sboms_spdx.py used anywhere? It looks like it's not used in build-definitions repo. Is it possible it's deprecated script?

I've found no use of it anywhere

or set(keys) & set(cachi2_indexed_packages.keys())
)

return package_is_duplicated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole package de-duplication approach for SPDX (the entire diff up to this function) is a bit overcomplicated and adds lots of code duplication.

Both syft and cachi2 never generate more than one purl for a package, and the purl format doesn't change between CDX and SPDX. So the deduplication logic should be identical between CDX and SPDX. The only difference should be whether we get the purl from .purl or from .externalReferences with .referenceType == "purl"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code duplication is intentional. Once we remove cyclonedx support, we can simply remove all non spdx related functions. If we enforce to support only one purl per package then logic could be simplified

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we enforce to support only one purl per package then logic could be simplified

yes please 🙏

Comment on lines +476 to +465
filtered_packages = []
for p in syft_sbom.get("packages", []):
if is_duplicate_package(p):
if get_package_key(p) in cachi2_packages_map:
cpackage = cachi2_packages_map[get_package_key(p)]
cpackage["externalRefs"] = sorted(
merge_external_refs(cpackage.get("externalRefs", []), p.get("externalRefs", [])),
key=lambda x: (
x["referenceCategory"],
x["referenceType"],
x["referenceLocator"],
),
)
cpackage["annotations"] = merge_annotations(cpackage.get("annotations", []), p.get("annotations", []))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we merging the duplicated packages together for SPDX when we don't do it for CycloneDX? For CycloneDX, we simply drop the duplicated packages

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for cyclonedx if the component is duplicated it's based on purl. That means the component is duplicate for sure. With SPDX considering possibility of multiple purls, is_duplicate_package returns True even on partial match of purls. Therefore considered removal of duplicated packages as possible information loss and that's why attributes of the duplicate package are merged with existing one instead of discarding whole package

Comment on lines +462 to +439
if relation["relatedSpdxElement"] in package_ids:
relationships.append(_relation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only look at relatedSpdxElement for document 1 when we look at both spdxElementId and relatedSpdxElement for document 2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in the first doc spdxElementId can be root element and in that case even relationships pointing to removed packages would be included

Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
- Added merge_syft_sboms_spdx.py
- Added format explanatory comments
Signed-off-by: Jindrich Luza <[email protected]>
- Formated test spdx files
- Improved merging spdx files
- Added test to compared merged cdx and spdx outputs
- Renamed middle element to root package
- Added mandatory attributes to spdx elements

Signed-off-by: Jindrich Luza <[email protected]>
- spdx_find_doc_and_root_package wraps spdx related code
- fixed isodate generator
- fixed annotator string
- build deps are generated as SPDXRef-Image- instead of SPDXRef-container-
- added test_main_input_sbom_spdx_with_packages

Signed-off-by: Jindrich Luza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants