Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzyhashing #152

Merged
merged 12 commits into from
Mar 20, 2024
Merged

Fuzzyhashing #152

merged 12 commits into from
Mar 20, 2024

Conversation

mcutshaw
Copy link
Collaborator

@mcutshaw mcutshaw commented Mar 1, 2024

Summary

If merged this pull request will address #149

Proposed changes

This request adds a fuzzy hash plugin using SSDEEP and TLSH. It places output in the metadata field of the main output BOM.

@nightlark
Copy link
Collaborator

Awesome! I'm out this upcoming week on vacation with limited internet access, so I might not be able to fully review the changes until March 11.

Initial thoughts are that overall it looks good, the main things are:

  • pre-commit lint check about using a generator instead of all with a list comprehension (or restructure it to avoid creating list in order to perform the check)
  • adding a brief docstring description for the fuzzy hashes hookimpl function
  • the description in the pyproject.toml file looks like it may need updating
  • maybe some error checks can be added to detect if there is an issue with the copy of ssdeep that was installed?

@mcutshaw
Copy link
Collaborator Author

mcutshaw commented Mar 5, 2024

Awesome! I'm out this upcoming week on vacation with limited internet access, so I might not be able to fully review the changes until March 11.

Initial thoughts are that overall it looks good, the main things are:

* pre-commit lint check about using a generator instead of `all` with a list comprehension (or restructure it to avoid creating list in order to perform the check)

* adding a brief docstring description for the fuzzy hashes hookimpl function

* the description in the pyproject.toml file looks like it may need updating

* maybe some error checks can be added to detect if there is an issue with the copy of `ssdeep` that was installed?

The requested changes have been made. Also I made SSDEEP its own dependency group, and check whether we can import it. If we can't we just run with TLSH, hopefully that should simplify things.

@mcutshaw
Copy link
Collaborator Author

@nightlark Would you be able to review the most recent changes?

@nightlark
Copy link
Collaborator

The check for SSDEEP as an optional dependency looks like a good option -- I opened an issue on the python-ssdeep repository about pre-compiled binary wheels for common platforms, and checking on the maintenance status. It seems like the python-ssdeep package likely doesn't have an active maintainer at the moment.

Copy link
Collaborator

@nightlark nightlark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some issues with ssdeep, which I think are related to the ssdeep python package itself (build errors, returned string from ssdeep.hash a string in the form b'<hash>' which looks like it is returning bytes turned into a str instead of just a str).

@nightlark nightlark merged commit 63b6c6c into LLNL:main Mar 20, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants