Skip to content

Commit

Permalink
Integrate snippet scanning into FOSSA CLI (#1298)
Browse files Browse the repository at this point in the history
  • Loading branch information
jssblck authored Oct 12, 2023
1 parent cd082d4 commit e3c5544
Show file tree
Hide file tree
Showing 18 changed files with 775 additions and 38 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# FOSSA CLI Changelog

## v3.8.17

Integrates FOSSA snippet scanning into the main application.
For more details and a quick start guide, see [the subcommand reference](./docs/references/subcommands/snippets.md).

## v3.8.16

Delivers another update to the `millhone` early preview of FOSSA snippet scanning:
Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ Reference guides provide an exhaustive listing of all CLI functionality. If you
- [`fossa test`](./references/subcommands/test.md)
- [`fossa report`](./references/subcommands/report.md)
- [`fossa list-targets`](./references/subcommands/list-targets.md)
- [`fossa snippets`](./references/subcommands/snippets.md)
<!-- TODO Write this README file
- [Common flags and options] -->
- CLI configuration files
Expand Down
106 changes: 106 additions & 0 deletions docs/references/subcommands/snippets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
## `fossa snippets`

This subcommand is the home for FOSSA's snippet scanning feature.

It is made up of two subcommands:

- [`fossa snippets analyze`](./snippets/analyze.md)
- [`fossa snippets commit`](./snippets/commit.md)

See the pages linked above for more details.

## Quickstart

```shell
# Set your API key. Get this from the FOSSA web application.
# On Windows, use this instead: $env:FOSSA_API_KEY=XXXX
export FOSSA_API_KEY=XXXX

# Navigate to your project directory.
cd $MY_PROJECT_DIR

# Analyze the project for local snippet matches.
# Match data is output to the directory specified to the `-o` or `--output` argument.
# If desired, you can manually review the matches output to the directory.
fossa snippets analyze -o snippets

# Commit matched snippets to a `fossa-deps` file.
# Provide it the same directory provided to `fossa snippets analyze`.
# This creates a `fossa-deps` file in your project.
#
# Note that you can control what kinds of snippets are committed;
# see subcommand documentation for more details.
fossa snippets commit --analyze-output snippets

# Run a standard FOSSA analysis, which will also upload snippet scanned dependencies,
# since they were stored in your `fossa-deps` file.
fossa analyze
```

## FAQ

### Is my source code sent to FOSSA's servers?

**Short version: No.** More detail explaining this is below.

FOSSA CLI fingerprints your first party source code but does not send it to the server.
The fingerprint is a SHA-256 hashed representation of the content that made up the snippet.

FOSSA CLI does send the fingerprint to the server, but since SHA-256 hashes are
[cryptographically secure](https://en.wikipedia.org/wiki/SHA-2), it is effectively not possible
for FOSSA to reproduce the original code that went into the snippet.

Of course, if the fingerprint matches FOSSA could then infer that the project contains that snippet of code,
but since FOSSA CLI does not send any additional context in the file there's no way for FOSSA or anyone else
to make use of this information.

The code to perform this is open source in this CLI;
users can also utilize tooling such as [echotraffic](https://github.com/fossas/echotraffic)
to report the information being uploaded.

### How does FOSSA snippet scanning work?

FOSSA snippet scanning operates over a matrix of options:

```
Targets × Kinds × Methods
```

Valid options for `Targets` are:

Target | Description
-----------|-----------------------------------------------------------------------
`Function` | Considers function declarations in the source code as snippet targets.

Valid options for `Kinds` are:

Kind | Description
------------|----------------------------------------------
`Full` | The full expression that makes up the target.
`Signature` | The function signature of `Function` targets.
`Body` | The function body of `Function` targets.

Valid options for `Methods` are:

Method | Description
--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------
`Raw` | The expression that makes up the target as written in the source code file.
`NormalizedSpace` | The expression with any character in the Unicode [whitespace character class][] replaced with a space, and any contiguous spaces collapsed to a single space.
`NormalizedComment` | The expression with comments removed, as defined by the source code language.
`NormalizedCode` | Equivalent to `NormalizedComment` followed by `NormalizedSpace`.

Given these options, the fully defined matrix of options is as follows:

```
{Function} × {Full, Signature, Body} × {Raw, NormalizedSpace, NormalizedComment, NormalizedCode}
```

FOSSA then scans open source projects for these snippets and records them along with their metadata,
such as where in the file the snippet originated and from what project.

Finally, when users scan their first-party projects, FOSSA extracts snippets in the same manner
and compares the fingerprints of the content of those snippets against the database.
If a match is found, FOSSA reports all open source projects in which the snippet was found,
along with recorded metadata about that snippet.

[whitespace character class]: https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace
110 changes: 110 additions & 0 deletions docs/references/subcommands/snippets/analyze.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
## `fossa snippets analyze`

This subcommand extracts snippets from a user project and compares them to the FOSSA database of snippets.
Any matches are then written to the directory provided.

## Options

Argument | Required | Default | Description
---------------------|----------|------------------------|--------------------------------------------------------------------------------------------------------------------------------------
`-o` / `--output` | Yes | None | The directory to which matches are output.
`--debug` | No | No | Enables debug mode. Note that debug bundles are not currently exported with `fossa snippets`, but this output is similarly useful.
`--overwrite-output` | No | No | If specified, overwrites the directory indicated by `--output`.
`--target` | No | `function` | If specified, extracts and matches only the specified targets. Specify multiple options by providing this argument multiple times.
`--kind` | No | `full, snippet, body` | If specified, extracts and matches only the specified kinds. Specify multiple options by providing this argument multiple times.
`--transform` | No | `space, comment, code` | If specified, extracts and matches only the specified transforms. Specify multiple options by providing this argument multiple times.

> [!NOTE]
> `--transform` corresponds to the `Normalized` methods [listed here](../snippets.md#how-does-fossa-snippet-scanning-work).
> The `Raw` method is always enabled and cannot be disabled.
## Output

Matches are written to the location specified by the `--output` (or `-o`) argument.

The output directory consists of a set of flat files, each representing a file in the scan directory
that had at least one matching snippet. These files are named with the path of the file relative to
the scan directory, with any path separators replaced by underscores, and a `.json` extension appended.

For example, the following project:
```
example-project/
lib/
lib.c
vendor/
openssh/
openssh.c
main.c
```

When scanned like `fossa snippets analyze -o snippets`,
would be presented like the below if all files contained a snippet match:
```
snippets/
lib_lib.c.json
lib_vendor_openssh_openssh.c.json
main.c.json
```

The content of each of these files is a JSON encoded array of matches,
where each object in the array consists of the following keys:

Key | Description
--------------------|-------------------------------------------------------------------------------
`found_in` | The relative path of the local file in which the snippet match was found.
`local_text` | The text that matched the snippet in the local file.
`local_snippet` | Information about the snippet extracted from the local file.
`matching_snippets` | A collection of snippets from the FOSSA knowledgebase that match this snippet.

The `local_snippet` object has the following keys:

Key | Description
--------------|---------------------------------------------------------------------------
`fingerprint` | The base64 representation of the snippet fingerprint.
`target` | The kind of source code item that matched for this snippet.
`kind` | The kind of snippet that was matched.
`method` | The normalization method used on the matching snippet.
`file_path` | The path of the file containing the snippet, relative to the project root.
`byte_start` | The byte index in the file at which the snippet begins.
`byte_end` | The byte index in the file at which the snippet ends.
`line_start` | The line number in the file at which the snippet begins.
`line_end` | The line number in the file at which the snippet ends.
`col_start` | The column number on the `line_start` at which the snippet begins.
`col_end` | The column number on the `line_end` at which the snippet ends.
`language` | The language of the identified snippet.

Each entry in the `matching_snippets` collection has the following keys:

Key | Description
--------------|---------------------------------------------------------------------------
`locator` | The FOSSA identifier for the project to which this snippet belongs.
`fingerprint` | The base64 representation of the snippet fingerprint.
`target` | The kind of source code item that matched for this snippet.
`kind` | The kind of snippet that was matched.
`method` | The normalization method used on the matching snippet.
`file_path` | The path of the file containing the snippet, relative to the project root.
`byte_start` | The byte index in the file at which the snippet begins.
`byte_end` | The byte index in the file at which the snippet ends.
`line_start` | The line number in the file at which the snippet begins.
`line_end` | The line number in the file at which the snippet ends.
`col_start` | The column number on the `line_start` at which the snippet begins.
`col_end` | The column number on the `line_end` at which the snippet ends.
`language` | The language of the identified snippet.
`ingest_id` | The ingestion run that discovered this snippet (not meaningful to users).

# Correcting Matches

In order to correct matches, users may manually edit the contents of this directory
or files within the directory to alter or remove matches.

For example, if a certain snippet is found in the local code that matches
a snippet in the FOSSA knowledgebase, but it's known to be a false positive,
users can script the removal of that snippet match from this directory prior to
committing these results in a FOSSA scan.

# Next Steps

After running `fossa snippets analyze`, the next step is to run `fossa snippets commit`.

These are separate steps to give users the ability to edit or review the matched data
prior to submitting the results to FOSSA.
44 changes: 44 additions & 0 deletions docs/references/subcommands/snippets/commit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## `fossa snippets commit`

This subcommand commits the analysis performed in the `analyze` subcommand into a `fossa-deps` file ([reference](../../files/fossa-deps.md)).
For more information on possible options, run `fossa snippets commit --help`.

## Options

Argument | Required | Default | Description
-------------------------|----------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------
`--analyze-output` | Yes | None | The directory to which `fossa snippets analyze` output its matches.
`--debug` | No | No | Enables debug mode. Note that debug bundles are not currently exported with `fossa snippets`, but this output is similarly useful.
`--overwrite-fossa-deps` | No | No | If specified, overwrites the `fossa-deps` file if present.
`--target` | No | `function` | If specified, commits matches consisting of only the specified targets. Specify multiple options by providing this argument multiple times.
`--kind` | No | `full, snippet, body` | If specified, commits matches consisting of only the specified kinds. Specify multiple options by providing this argument multiple times.
`--transform` | No | `space, comment, code` | If specified, commits matches consisting of only the specified transforms. Specify multiple options by providing this argument multiple times.
`--format` | No | `yml` | Allows configuring the format of the generated `fossa-deps` file.

> [!NOTE]
> `--transform` corresponds to the `Normalized` methods [listed here](../snippets.md#how-does-fossa-snippet-scanning-work).
> The `Raw` method is always enabled and cannot be disabled.
## Input

The primary thing this subcommand requires is the path to the directory in which the output of `analyze`
was written. Users can also alter which kinds of matches to commit, and customize the output format
of the created `fossa-deps` file.

## Output

The result of this subcommand is a `fossa-deps` file written to the root of the project directory.

> [!NOTE]
> This subcommand will not overwrite an existing `fossa-deps` file by default,
> and currently does not merge its output into an existing `fossa-deps` file.
>
> However, users can customize the output format (via `--format`) and then
> perform scripted merges themselves.
## Next Steps

After running `fossa snippets commit`, the next step is to run `fossa analyze` on the project.

FOSSA CLI will then pick up the dependencies reported in that `fossa-deps` file and report them
as dependencies of the project.
2 changes: 1 addition & 1 deletion extlib/millhone/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "millhone"
version = "0.3.1"
version = "0.3.2"
edition = "2021"

[features]
Expand Down
45 changes: 15 additions & 30 deletions extlib/millhone/docs/subcommands/analyze.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,50 +3,35 @@
This subcommand analyzes a local project for snippets that match snippets in the FOSSA knowledgebase.
For more information on possible options, run `millhone analyze --help`.

# Output
## Output

Matches are written to the location specified by the `--output` (or `-o`) argument.
If this argument is not specified, `millhone` creates a temporary directory prefixed by "millhone_".

> [!NOTE]
> Millhone by default creates this directory in the system temporary directory.
> If desired, this can be customized:
> - On Linux and macOS: set the `TMPDIR` environment variable.
> - On Windows, this uses the `GetTempPath` system call, which uses the first valid option of:
> - The path specified by the `TMP` environment variable.
> - The path specified by the `TEMP` environment variable.
> - The path specified by the `USERPROFILE` environment variable.
> - The Windows directory.

The output directory consists of a set of flat files, each representing a file in the scan directory
that had at least one matching snippet. These files are named with the path of the file relative to
the scan directory, with any path separators replaced by underscores, and a `.json` extension appended.

For example, the following project:
```
/Users/
me/
projects/
example-project/
lib/
lib.c
vendor/
openssh/
openssh.c
main.c
example-project/
lib/
lib.c
vendor/
openssh/
openssh.c
main.c
```

When scanned like `millhone analyze /Users/me/projects/example-project`,
When scanned like `fossa snippets analyze -o snippets`,
would be presented like the below if all files contained a snippet match:
```
/tmp/
millhone_abcd1234/
lib_lib.c.json
lib_vendor_openssh_openssh.c.json
main.c.json
snippets/
lib_lib.c.json
lib_vendor_openssh_openssh.c.json
main.c.json
```

The contents of each of these files are a JSON encoded array of matches,
The content of each of these files is a JSON encoded array of matches,
where each object in the array consists of the following keys:

Key | Description
Expand Down Expand Up @@ -104,7 +89,7 @@ committing these results in a FOSSA scan.

# Next Steps

After running `millhone analyze`, the next step is to run `millhone commit`.
After running `fossa snippets analyze`, the next step is to run `fossa snippets commit`.

These are separate steps to give users the ability to edit or review the matched data
prior to submitting the results to FOSSA.
2 changes: 2 additions & 0 deletions extlib/millhone/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ fn main() -> stable_eyre::Result<()> {
.with_writer(std::io::stdout)
.with_file(false)
.with_line_number(false)
.without_time()
.with_target(false)
.with_span_events(tracing_subscriber::fmt::format::FmtSpan::NONE)
.with_filter(app.level_filter())
.with_filter(self_sourced_events(app.log_level)),
Expand Down
4 changes: 4 additions & 0 deletions spectrometer.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ library
App.Fossa.Config.LinkUserBinaries
App.Fossa.Config.ListTargets
App.Fossa.Config.Report
App.Fossa.Config.Snippets
App.Fossa.Config.Test
App.Fossa.Container
App.Fossa.Container.AnalyzeNative
Expand All @@ -210,6 +211,9 @@ library
App.Fossa.Report
App.Fossa.Report.Attribution
App.Fossa.RunThemis
App.Fossa.Snippets
App.Fossa.Snippets.Analyze
App.Fossa.Snippets.Commit
App.Fossa.Subcommand
App.Fossa.Test
App.Fossa.VendoredDependency
Expand Down
Loading

0 comments on commit e3c5544

Please sign in to comment.