Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qs: automate the construction of databases #796

Open
williballenthin opened this issue Jun 12, 2023 · 3 comments
Open

qs: automate the construction of databases #796

williballenthin opened this issue Jun 12, 2023 · 3 comments
Labels
QS QUANTUMSTRAND

Comments

@williballenthin
Copy link
Collaborator

via #761 and @r0ny123

For example, to build the expert db, we can use GitHub CI, to automatically add the strings from capa rules whenever a rule with a string is added/updated in the capa rules repo.

@williballenthin williballenthin added the QS QUANTUMSTRAND label Jun 12, 2023
@williballenthin
Copy link
Collaborator Author

for the #common database, this took many hours to build: a dozen hours to fetch the samples from VT, a few hours to extract strings, a few hours to index the results. im not sure this would fit within our GH Actions limits. im also not sure how frequently this data is likely to change, though its certainly worth investigating.

@williballenthin
Copy link
Collaborator Author

the #expert database is pre-populated with strings from capa rules; however, this was honestly just a shortcut to get something in there. we would like the #expert database to be something that is super easy for users to update and contribute back, such as with a small TUI program or github PR.

i think actually there are many bad entries in the database today from capa, things like "kernel32.dll" etc. So, im hesitate to keep pulling these strings from capa automatically. maybe we can tag update to capa-rules with followup actions to manually update the #expert database when a good string is found?

@r0ny123
Copy link
Contributor

r0ny123 commented Jun 27, 2023

for the #common database, this took many hours to build: a dozen hours to fetch the samples from VT, a few hours to extract strings, a few hours to index the results. im not sure this would fit within our GH Actions limits. im also not sure how frequently this data is likely to change, though its certainly worth investigating.

We can fetch that info from VT weekly/monthly basis. and regarding the GitHub action limit we can leverage some cloud platforms like AWS etc. Actually, I like the idea how OALabs/hashdb leveraging that.
209026245-1686e6fe-0130-44c7-a04e-4f7d3b77b684

maybe we can tag update to capa-rules with followup actions to manually update the #expert database when a good string is found?

This is a good idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QS QUANTUMSTRAND
Projects
None yet
Development

No branches or pull requests

2 participants