-
Notifications
You must be signed in to change notification settings - Fork 179
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Integrate snippet scanning into FOSSA CLI (#1298)
- Loading branch information
Showing
18 changed files
with
775 additions
and
38 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
## `fossa snippets` | ||
|
||
This subcommand is the home for FOSSA's snippet scanning feature. | ||
|
||
It is made up of two subcommands: | ||
|
||
- [`fossa snippets analyze`](./snippets/analyze.md) | ||
- [`fossa snippets commit`](./snippets/commit.md) | ||
|
||
See the pages linked above for more details. | ||
|
||
## Quickstart | ||
|
||
```shell | ||
# Set your API key. Get this from the FOSSA web application. | ||
# On Windows, use this instead: $env:FOSSA_API_KEY=XXXX | ||
export FOSSA_API_KEY=XXXX | ||
|
||
# Navigate to your project directory. | ||
cd $MY_PROJECT_DIR | ||
|
||
# Analyze the project for local snippet matches. | ||
# Match data is output to the directory specified to the `-o` or `--output` argument. | ||
# If desired, you can manually review the matches output to the directory. | ||
fossa snippets analyze -o snippets | ||
|
||
# Commit matched snippets to a `fossa-deps` file. | ||
# Provide it the same directory provided to `fossa snippets analyze`. | ||
# This creates a `fossa-deps` file in your project. | ||
# | ||
# Note that you can control what kinds of snippets are committed; | ||
# see subcommand documentation for more details. | ||
fossa snippets commit --analyze-output snippets | ||
|
||
# Run a standard FOSSA analysis, which will also upload snippet scanned dependencies, | ||
# since they were stored in your `fossa-deps` file. | ||
fossa analyze | ||
``` | ||
|
||
## FAQ | ||
|
||
### Is my source code sent to FOSSA's servers? | ||
|
||
**Short version: No.** More detail explaining this is below. | ||
|
||
FOSSA CLI fingerprints your first party source code but does not send it to the server. | ||
The fingerprint is a SHA-256 hashed representation of the content that made up the snippet. | ||
|
||
FOSSA CLI does send the fingerprint to the server, but since SHA-256 hashes are | ||
[cryptographically secure](https://en.wikipedia.org/wiki/SHA-2), it is effectively not possible | ||
for FOSSA to reproduce the original code that went into the snippet. | ||
|
||
Of course, if the fingerprint matches FOSSA could then infer that the project contains that snippet of code, | ||
but since FOSSA CLI does not send any additional context in the file there's no way for FOSSA or anyone else | ||
to make use of this information. | ||
|
||
The code to perform this is open source in this CLI; | ||
users can also utilize tooling such as [echotraffic](https://github.com/fossas/echotraffic) | ||
to report the information being uploaded. | ||
|
||
### How does FOSSA snippet scanning work? | ||
|
||
FOSSA snippet scanning operates over a matrix of options: | ||
|
||
``` | ||
Targets × Kinds × Methods | ||
``` | ||
|
||
Valid options for `Targets` are: | ||
|
||
Target | Description | ||
-----------|----------------------------------------------------------------------- | ||
`Function` | Considers function declarations in the source code as snippet targets. | ||
|
||
Valid options for `Kinds` are: | ||
|
||
Kind | Description | ||
------------|---------------------------------------------- | ||
`Full` | The full expression that makes up the target. | ||
`Signature` | The function signature of `Function` targets. | ||
`Body` | The function body of `Function` targets. | ||
|
||
Valid options for `Methods` are: | ||
|
||
Method | Description | ||
--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
`Raw` | The expression that makes up the target as written in the source code file. | ||
`NormalizedSpace` | The expression with any character in the Unicode [whitespace character class][] replaced with a space, and any contiguous spaces collapsed to a single space. | ||
`NormalizedComment` | The expression with comments removed, as defined by the source code language. | ||
`NormalizedCode` | Equivalent to `NormalizedComment` followed by `NormalizedSpace`. | ||
|
||
Given these options, the fully defined matrix of options is as follows: | ||
|
||
``` | ||
{Function} × {Full, Signature, Body} × {Raw, NormalizedSpace, NormalizedComment, NormalizedCode} | ||
``` | ||
|
||
FOSSA then scans open source projects for these snippets and records them along with their metadata, | ||
such as where in the file the snippet originated and from what project. | ||
|
||
Finally, when users scan their first-party projects, FOSSA extracts snippets in the same manner | ||
and compares the fingerprints of the content of those snippets against the database. | ||
If a match is found, FOSSA reports all open source projects in which the snippet was found, | ||
along with recorded metadata about that snippet. | ||
|
||
[whitespace character class]: https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
## `fossa snippets analyze` | ||
|
||
This subcommand extracts snippets from a user project and compares them to the FOSSA database of snippets. | ||
Any matches are then written to the directory provided. | ||
|
||
## Options | ||
|
||
Argument | Required | Default | Description | ||
---------------------|----------|------------------------|-------------------------------------------------------------------------------------------------------------------------------------- | ||
`-o` / `--output` | Yes | None | The directory to which matches are output. | ||
`--debug` | No | No | Enables debug mode. Note that debug bundles are not currently exported with `fossa snippets`, but this output is similarly useful. | ||
`--overwrite-output` | No | No | If specified, overwrites the directory indicated by `--output`. | ||
`--target` | No | `function` | If specified, extracts and matches only the specified targets. Specify multiple options by providing this argument multiple times. | ||
`--kind` | No | `full, snippet, body` | If specified, extracts and matches only the specified kinds. Specify multiple options by providing this argument multiple times. | ||
`--transform` | No | `space, comment, code` | If specified, extracts and matches only the specified transforms. Specify multiple options by providing this argument multiple times. | ||
|
||
> [!NOTE] | ||
> `--transform` corresponds to the `Normalized` methods [listed here](../snippets.md#how-does-fossa-snippet-scanning-work). | ||
> The `Raw` method is always enabled and cannot be disabled. | ||
## Output | ||
|
||
Matches are written to the location specified by the `--output` (or `-o`) argument. | ||
|
||
The output directory consists of a set of flat files, each representing a file in the scan directory | ||
that had at least one matching snippet. These files are named with the path of the file relative to | ||
the scan directory, with any path separators replaced by underscores, and a `.json` extension appended. | ||
|
||
For example, the following project: | ||
``` | ||
example-project/ | ||
lib/ | ||
lib.c | ||
vendor/ | ||
openssh/ | ||
openssh.c | ||
main.c | ||
``` | ||
|
||
When scanned like `fossa snippets analyze -o snippets`, | ||
would be presented like the below if all files contained a snippet match: | ||
``` | ||
snippets/ | ||
lib_lib.c.json | ||
lib_vendor_openssh_openssh.c.json | ||
main.c.json | ||
``` | ||
|
||
The content of each of these files is a JSON encoded array of matches, | ||
where each object in the array consists of the following keys: | ||
|
||
Key | Description | ||
--------------------|------------------------------------------------------------------------------- | ||
`found_in` | The relative path of the local file in which the snippet match was found. | ||
`local_text` | The text that matched the snippet in the local file. | ||
`local_snippet` | Information about the snippet extracted from the local file. | ||
`matching_snippets` | A collection of snippets from the FOSSA knowledgebase that match this snippet. | ||
|
||
The `local_snippet` object has the following keys: | ||
|
||
Key | Description | ||
--------------|--------------------------------------------------------------------------- | ||
`fingerprint` | The base64 representation of the snippet fingerprint. | ||
`target` | The kind of source code item that matched for this snippet. | ||
`kind` | The kind of snippet that was matched. | ||
`method` | The normalization method used on the matching snippet. | ||
`file_path` | The path of the file containing the snippet, relative to the project root. | ||
`byte_start` | The byte index in the file at which the snippet begins. | ||
`byte_end` | The byte index in the file at which the snippet ends. | ||
`line_start` | The line number in the file at which the snippet begins. | ||
`line_end` | The line number in the file at which the snippet ends. | ||
`col_start` | The column number on the `line_start` at which the snippet begins. | ||
`col_end` | The column number on the `line_end` at which the snippet ends. | ||
`language` | The language of the identified snippet. | ||
|
||
Each entry in the `matching_snippets` collection has the following keys: | ||
|
||
Key | Description | ||
--------------|--------------------------------------------------------------------------- | ||
`locator` | The FOSSA identifier for the project to which this snippet belongs. | ||
`fingerprint` | The base64 representation of the snippet fingerprint. | ||
`target` | The kind of source code item that matched for this snippet. | ||
`kind` | The kind of snippet that was matched. | ||
`method` | The normalization method used on the matching snippet. | ||
`file_path` | The path of the file containing the snippet, relative to the project root. | ||
`byte_start` | The byte index in the file at which the snippet begins. | ||
`byte_end` | The byte index in the file at which the snippet ends. | ||
`line_start` | The line number in the file at which the snippet begins. | ||
`line_end` | The line number in the file at which the snippet ends. | ||
`col_start` | The column number on the `line_start` at which the snippet begins. | ||
`col_end` | The column number on the `line_end` at which the snippet ends. | ||
`language` | The language of the identified snippet. | ||
`ingest_id` | The ingestion run that discovered this snippet (not meaningful to users). | ||
|
||
# Correcting Matches | ||
|
||
In order to correct matches, users may manually edit the contents of this directory | ||
or files within the directory to alter or remove matches. | ||
|
||
For example, if a certain snippet is found in the local code that matches | ||
a snippet in the FOSSA knowledgebase, but it's known to be a false positive, | ||
users can script the removal of that snippet match from this directory prior to | ||
committing these results in a FOSSA scan. | ||
|
||
# Next Steps | ||
|
||
After running `fossa snippets analyze`, the next step is to run `fossa snippets commit`. | ||
|
||
These are separate steps to give users the ability to edit or review the matched data | ||
prior to submitting the results to FOSSA. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
## `fossa snippets commit` | ||
|
||
This subcommand commits the analysis performed in the `analyze` subcommand into a `fossa-deps` file ([reference](../../files/fossa-deps.md)). | ||
For more information on possible options, run `fossa snippets commit --help`. | ||
|
||
## Options | ||
|
||
Argument | Required | Default | Description | ||
-------------------------|----------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------- | ||
`--analyze-output` | Yes | None | The directory to which `fossa snippets analyze` output its matches. | ||
`--debug` | No | No | Enables debug mode. Note that debug bundles are not currently exported with `fossa snippets`, but this output is similarly useful. | ||
`--overwrite-fossa-deps` | No | No | If specified, overwrites the `fossa-deps` file if present. | ||
`--target` | No | `function` | If specified, commits matches consisting of only the specified targets. Specify multiple options by providing this argument multiple times. | ||
`--kind` | No | `full, snippet, body` | If specified, commits matches consisting of only the specified kinds. Specify multiple options by providing this argument multiple times. | ||
`--transform` | No | `space, comment, code` | If specified, commits matches consisting of only the specified transforms. Specify multiple options by providing this argument multiple times. | ||
`--format` | No | `yml` | Allows configuring the format of the generated `fossa-deps` file. | ||
|
||
> [!NOTE] | ||
> `--transform` corresponds to the `Normalized` methods [listed here](../snippets.md#how-does-fossa-snippet-scanning-work). | ||
> The `Raw` method is always enabled and cannot be disabled. | ||
## Input | ||
|
||
The primary thing this subcommand requires is the path to the directory in which the output of `analyze` | ||
was written. Users can also alter which kinds of matches to commit, and customize the output format | ||
of the created `fossa-deps` file. | ||
|
||
## Output | ||
|
||
The result of this subcommand is a `fossa-deps` file written to the root of the project directory. | ||
|
||
> [!NOTE] | ||
> This subcommand will not overwrite an existing `fossa-deps` file by default, | ||
> and currently does not merge its output into an existing `fossa-deps` file. | ||
> | ||
> However, users can customize the output format (via `--format`) and then | ||
> perform scripted merges themselves. | ||
## Next Steps | ||
|
||
After running `fossa snippets commit`, the next step is to run `fossa analyze` on the project. | ||
|
||
FOSSA CLI will then pick up the dependencies reported in that `fossa-deps` file and report them | ||
as dependencies of the project. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[package] | ||
name = "millhone" | ||
version = "0.3.1" | ||
version = "0.3.2" | ||
edition = "2021" | ||
|
||
[features] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.