-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SBOM cataloger #1029
SBOM cataloger #1029
Conversation
dbf7411
to
6a04ac1
Compare
Related #737 We should catalog sboms using known file extensions. Currently each format that syft supports has a registered file name/extension. spdx - *.spdx.json, *.spdx.xml, *.spdx We should use the above file patterns for the sbom cataloger. |
A larger question re:design is that we have now introduced a dependency on our internal formatting library on the cataloger. The way that syft is currently structured, we have kept the formatters as a separate layer of logic that build on top of the syft sbom model. This is probably something we might want to think about as we introduce dependencies between the formatters and the catalogers. The PR also currently only parses cyclonedx SBOMs, we should probably extend it to the other formats syft supports if we agree with the proposed design. One of the other sticking points we have discussed in the past when we have talked about cataloging sboms is providing links back to the original sbom as evidence. Currently syft is able to provide details as to 'why' it included a specific component in the output. One of the blockers to implementing #737 has been figuring out an appropriate way to represent that syft itself didn't discover the cataloged components, rather it pulled them from an sbom present in the image. cc: @wagoodman, @luhring |
syft/pkg/cataloger/sbom/cataloger.go
Outdated
// NewSBOMCataloger returns a new SBOM cataloger object loaded from saved SBOM JSON. | ||
func NewSBOMCataloger() *common.GenericCataloger { | ||
globParsers := map[string]common.ParserFn{ | ||
"**/deps.json": parseSBOM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find this pattern in the recognized file patterns for cyclonedx. (See https://cyclonedx.org/specification/overview/#recognized-file-patterns)
Where does this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just not aware of CycloneDX recognized file patterns. My bad. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Functionally it looks great. We still have a few other open questions about functional and non-functional requirements around this feature that we will probably need more input from @wagoodman / @luhring and team. (#1029 (comment))
syft/pkg/cataloger/sbom/cataloger.go
Outdated
// NewSBOMCataloger returns a new SBOM cataloger object loaded from saved SBOM JSON. | ||
func NewSBOMCataloger() *common.GenericCataloger { | ||
globParsers := map[string]common.ParserFn{ | ||
"**/syft.json": parseSyftJson, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not a valid syft SBOM file pattern. The IANA registered type for syft is *.syft.json per https://www.iana.org/assignments/media-types/application/vnd.syft+json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned up
0a25448
to
b09e277
Compare
@patrikbeno I just thought about merging SBOMs or filtering it with a tool like Dependency Track. |
b09e277
to
fb468c8
Compare
rebased |
I've just had a play with this branch to look at this, as I'm keen to see this function get added. It is determinable from the SBOMs that syft discovered this component from an SBOM on disk:
Also, thanks to #1038, users can disable the SBOM cataloger should they desire. Is this still a blocking issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the submission @patrikbeno! I've gone through and labeled some concerns regarding the detection behavior as it stands for the current PR.
Another issue I see here is making sure that the list of packages returned by the found SBOM are deduplicated or in some way updated to say that they were discovered as part of an SBOM cataloger and NOT their original discovery point. The current implementation would make some incorrect assertions around having discovered the package on disk when actually it was discovered only as part of a declared SBOM that was parsed and returned to the package list.
In short - we really appreciate the contribution and would like some time to work on the branch to update integration/unit tests surrounding this new cataloger while also covering some of the usability concerns highlighted above.
syft/pkg/cataloger/sbom/cataloger.go
Outdated
// NewSBOMCataloger returns a new SBOM cataloger object loaded from saved SBOM JSON. | ||
func NewSBOMCataloger() *common.GenericCataloger { | ||
globParsers := map[string]common.ParserFn{ | ||
"**/*.syft.json": parseSyftJson, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want these to be always on by default and so broad. First, syft already makes a pretty large distinction where, when image scanning, we don't try to merge to list packages that are declared. Rather, we generally only try and put packages in the SBOM when we have validated 100% yes these are installed.
If we're going to extend image scanning to include declared packages I think the first iteration should be where a user can configure which paths they would like to check for declared packages (in SBOM format) and this behavior should by default be off so we're secure out of the box.
Looking at the SBOMs producing using this cataloger, the sourceInfo in SPDX SBOMs is not correct: {
"SPDXID": "SPDXRef-5ddd6a37123ac5cc",
"name": "AWSSDK.Core",
"licenseConcluded": "NONE",
"downloadLocation": "NOASSERTION",
"externalRefs": [
<snip>
],
"filesAnalyzed": false,
"licenseDeclared": "NONE",
"sourceInfo": "acquired package info from dotnet project assets file: /golang.spdx.json",
"versionInfo": "3.7.10.6"
}, Another example where a JAR is in an SBOM gives Were you expecting any other inaccuracies that need resolving? I can't see anything incorrect in the cyclonedx or syft that misleads regarding detection (and per #1029 (comment) in those cases it's trivial to identify packages identified by the sbom-cataloguer). |
95f3f59
to
b8f33ed
Compare
rebased |
Hi! Kind regards |
b8f33ed
to
cb10e39
Compare
rebased |
sort of, I did my best but it needs some improvements |
cb10e39
to
3016aef
Compare
improvement? |
fcd39e7
to
9ceaa0d
Compare
I've played with this branch to investigate this concern. There is no deduplication, that's consistent with other package types, e.g both the python cataloger and OS package scanners will detect python packages that are in OS packages. But the SBOM cataloger doesn't obscure the other catalogers results either. @spiffcs what are the outstanding issues for this PR, and can the community help resolve them? I think it's 1) adding integration tests 2) making this cataloger off by default. Is there anything else? |
@tofay I think that's correct
|
9ceaa0d
to
d9f7e85
Compare
rebased |
Hi @patrikbeno and @tofay, hope you are doing well. We were discussing this pull request as a team today, and we realized it would be helpful to have a live conversation about this feature. Would you be able to join one of our upcoming community meetings? They are held every two weeks on Thursdays, at 1200 EST. The next one is August 18: https://github.com/anchore/grype#join-our-community-meetings -- thanks in advance! Tim |
I tried this on a project I hope to convert to syft. It has an SPDX fragment (in tag value format) for each dependency. I noticed two issues (with regards to the SPDX fragments already there):
I guess this isn't this PR's fault but from the syft parser, but I wanted to put them out there for discussion. And potentially get an idea of the amount of effort needed to implement in a followup. |
e407663
to
d2be4b7
Compare
@patrikbeno We recently came to a conclusion internally that SBOMs with multiple sources aren't anywhere on the near-term horizon, which brought the finish line much closer to the current state of this PR. I wanted to try and get this to the finish line. I've rebased this PR and made a few additions based on multiple conversations in the community meeting and other threads:
What do you think about these additions? Happy to chat through them or remove them if that's not what you're looking for. Questions for future PRs to consider:
|
func decodeName(group string, name string) string { | ||
if group != "" { | ||
return group + "/" + name | ||
} | ||
return name | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems as though it might be causing an issue in the encode-decode-encode test, should this change be separate from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I do think this should land in another PR, I'll follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broken into a separate PR #1345
syft/pkg/cataloger/sbom/cataloger.go
Outdated
WithParserByGlobs(makeParser(syftjson.Format()), "**/*.syft.json"). | ||
WithParserByGlobs(makeParser(cyclonedxjson.Format()), "**/bom.json", "**/*.cdx.json"). | ||
WithParserByGlobs(makeParser(cyclonedxxml.Format()), "**/bom.xml", "**/*.cdx.xml"). | ||
WithParserByGlobs(makeParser(spdx22json.Format()), "**/*.spdx.json"). | ||
WithParserByGlobs(makeParser(spdx22tagvalue.Format()), "**/*.spdx", "**/*.spdx.tv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this all seems accurate, would it be better to just call the generic decode without a specific format in case people name files incorrectly -- like JSON with .spdx
suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I agree. I'll update
While adjusting for #1029 (comment) to generalize the SBOM parsing, it became apparent that high-level API functions were needed. All of the functions under the top-level API at |
Signed-off-by: Patrik Beno <[email protected]>
and add integration test Signed-off-by: Patrik Beno <[email protected]>
Signed-off-by: Patrik Beno <[email protected]>
Signed-off-by: Patrik Beno <[email protected]>
Signed-off-by: Patrik Beno <[email protected]>
Signed-off-by: Patrik Beno <[email protected]>
Signed-off-by: Patrik Beno <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
5f1af31
to
408d549
Compare
Signed-off-by: Alex Goodman <[email protected]>
408d549
to
b377cc4
Compare
Huge thanks to everyone who contributed thoughts, comments, and guidance on this PR! Special shoutout to @patrikbeno for getting the ball rolling and starting the discussion. Congrats to @wagoodman for getting it finalized and @kzantow for the review needed to get something like this merged into syft. |
Thanks @spiffcs 🙌 -- @patrikbeno let me know if there is anything about the changes mentioned in #1029 (comment) that don't sit right or you have questions on and we can follow up with changes in another PR. |
Very exciting to see this PR merged, amazing work everyone!
|
* SBOM cataloger Signed-off-by: Patrik Beno <[email protected]> * sbom-cataloger: turn off by default and add integration test Signed-off-by: Patrik Beno <[email protected]> * SBOM cataloger Signed-off-by: Patrik Beno <[email protected]> * SBOM cataloger (optimize) Signed-off-by: Patrik Beno <[email protected]> * SBOM cataloger (fix) Signed-off-by: Patrik Beno <[email protected]> * SBOM cataloger (fix imports anchore#1172) Signed-off-by: Patrik Beno <[email protected]> * SBOM cataloger (fix: support group attribute in CDX SBOMs) Signed-off-by: Patrik Beno <[email protected]> * port to generic cataloger and add relationship to original file Signed-off-by: Alex Goodman <[email protected]> * generalize parser for all format globs Signed-off-by: Alex Goodman <[email protected]> Signed-off-by: Patrik Beno <[email protected]> Signed-off-by: Alex Goodman <[email protected]> Co-authored-by: Tom Fay <[email protected]> Co-authored-by: Alex Goodman <[email protected]>
Supports ad hoc binary application layouts:
CI saves CycloneDX SBOM JSON in docker image or application deployment/installation folder.
During image/directory scanning, Syft does not need to rely on specific layouts supported by specific catalogers.
Instead, application may use whatever layout and format it needs, and provides SBOM JSON artifact for integration with Syft or other tools.
Typical scenario is a node/react/webpack build folder with static content for web server. This particular use case can be supported by providing npm/yarn lock file and enabling javascript-lock-cataloger (#1022).
SBOM cataloger provides more generic approach to this problem.