Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzyhashing #152

Merged
merged 12 commits into from
Mar 20, 2024
39 changes: 39 additions & 0 deletions plugins/fuzzyhashes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Fuzzy H Plugin for SBOM Surfactant

A plugin for Surfactant that uses TLSH and SSDEEP to generate fuzzy hashes.

## Quickstart
**Note:** By default only TLSH is enabled as SSDEEP has a more complex build process, if you wish to include SSDEEP see the relevant section.

In the same virtual environment that Surfactant was installed in, install this plugin with `pip install .`.

For developers making changes to this plugin, install it with `pip install -e .`.

This will output within the metadata field of the main SBOM output JSON. The metadata field added will be in the below format.

```json
{
"ssdeep": "3072:zk9IYDIW/+wxfiqV/jKneO1S4r88117lHc7ws47Fg5Q+ZLgFYY5:zsIYzpQqV/YLr8811P5",
"tlsh": "T1C3449303A267DC9FC4445AB105A75168FB38FC16CF36BB1BB242B73E6A31F009EA5640"
}
```

Surfactant features for controlling which plugins are enabled/disabled can be used to control
whether or not this plugin will run using the plugin name `surfactantplugin_fuzzyhashes.py` (the name given in
mcutshaw marked this conversation as resolved.
Show resolved Hide resolved
`pyproject.toml` under the `project.entry-points."surfactant"` section).

## SSDEEP

If you do not have SSDEEP library already installed, you must run `export BUILD_LIB=1` before running `pip install`.

The SSDEEP package also required the following packages on a ubuntu 22 system:
libtool
build-essential
automake

To install SSDEEP run the following:
`pip install .[ssdeep]` or `pip install -e .[ssdeep]`

## Uninstalling

The plugin can be uninstalled with `pip uninstall surfactantplugin_fuzzyhashes.py`.
mcutshaw marked this conversation as resolved.
Show resolved Hide resolved
36 changes: 36 additions & 0 deletions plugins/fuzzyhashes/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"

[project]
name = "surfactantplugin-fuzzyhashes"
authors = [
{name = "Michael Cutshaw", email = "[email protected]"},
]
description = "Surfactant plugin for generating fuzzy hashes"
readme = "README.md"
requires-python = ">=3.8"
keywords = ["surfactant"]
license = {text = "MIT License"}
classifiers = [
"Programming Language :: Python :: 3",
"Environment :: Console",
"Operating System :: MacOS",
"Operating System :: Microsoft :: Windows",
"Operating System :: POSIX :: Linux",
"License :: OSI Approved :: MIT License",
]
dependencies = [
"python-tlsh",
mcutshaw marked this conversation as resolved.
Show resolved Hide resolved
"surfactant"
]
dynamic = ["version"]

[project.optional-dependencies]
ssdeep = ["ssdeep"]

[project.entry-points."surfactant"]
"surfactantplugin_fuzzyhashes" = "surfactantplugin_fuzzyhashes"

[tool.setuptools]
py-modules=["surfactantplugin_fuzzyhashes"]
66 changes: 66 additions & 0 deletions plugins/fuzzyhashes/surfactantplugin_fuzzyhashes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Copyright 2024 Lawrence Livermore National Security, LLC
# See the top-level LICENSE file for details.
#
# SPDX-License-Identifier: MIT


import logging
from pathlib import Path

try:
import ssdeep

SSDEEP_PRESENT = True
except ImportError:
SSDEEP_PRESENT = False
logging.warning("SSDEEP is not installed, therefore those hashes will not be generated.")
import tlsh

import surfactant.plugin
from surfactant.sbomtypes import SBOM, Software


def do_tlsh(bin_data: bytes):
return tlsh.hash(bin_data)


def do_ssdeep(bin_data: bytes):
return ssdeep.hash(bin_data)


@surfactant.plugin.hookimpl(specname="extract_file_info")
def fuzzyhashes(sbom: SBOM, software: Software, filename: str, filetype: str):
"""
Generate TLSH and potentially SSDEEP fuzzy hashes for the provided files.
:param sbom(SBOM): The SBOM that the software entry/file is being added to. Can be used to add observations or analysis data.
:param software(Software): The software entry associated with the file to extract information from.
:param filename (str): The full path to the file to extract information from.
:param filetype (str): File type information based on magic bytes.
"""

hashdata = [(do_tlsh, "tlsh")]
if SSDEEP_PRESENT:
hashdata.append((do_ssdeep, "ssdeep"))
# Validate the file path
existing_data = {}
filename = Path(filename)
if not filename.exists():
raise FileNotFoundError(f"No such file: '{filename}'")

if all(hashname in existing_data for _, hashname in hashdata):
# if everything is already in there, we just want to terminate without writing
return None
with open(filename, "rb") as f_bin:
bin_data = f_bin.read()

for hashfunc, hashname in hashdata:
if hashname in existing_data:
logging.info("Already %s hashed %s", hashname, filename.name)
else:
logging.info(
"Found existing JSON file for %s but without '%s' key. Proceeding with hashing.",
filename.name,
hashname,
)
existing_data[hashname] = hashfunc(bin_data)
return existing_data