-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
results: define output format/schema #721
Comments
@mr-tz I have worked on adding new format to parse output json back to capa in the past [PR_#1396]. Can I look into this ? |
Sounds great, please take a look and let's discuss if you have any questions or a design draft. |
Could you please shed some light on this one. |
QS uses a bunch of embedded databases to provide context about strings. Things like prevalence, library, version, etc. So all the information from each database should be merged into records about each recovered string. |
@williballenthin @mr-tz for further discussion and inputs, I have created a new PR #972 :) |
Hi @ooprathamm, pulling the discussion to this issue. On a higher design level we'll have to see how we want to deal with structure vs. tagged strings vs. other functionality. Ideally, we can decouple the storage and logic a bit. The current POC implementation is quite elegant but IMO combines multiple features potentially complication further work. On the other hand, we may keep the extraction logic and just change the resulting document. In my head I currently have something like (based on some of your work, here, thanks!): {
"strings": {
"static_strings": [
{
"string": {
"encoding": "ascii",
"slice": {
"range": {
"length": 40,
"offset": 77
}
},
"string": "!This program cannot be run in DOS mode."
},
"structure": "pe.header",
"tags": [
"#common"
]
},
{
"string": {
"encoding": "ascii",
"slice": {
"range": {
"length": 12,
"offset": 11644
}
},
"string": "VirtualQuery"
},
"structure": "import table",
"tags": [
"#winapi",
"#common"
]
}
]
}
} And/or we add a meta section storing the optional layout (PE, ELF) of a file. This may require further discussion and be a larger effort but I'd be curious to hear your thoughts. |
Thanks for re-sparking this discussion @mr-tz. I think things like: location, length, encoding, and content of the string is part of the definition of the (static) string and should be at the top level. Or under Other information, like: structure, tags, and prevalence are more like "context" - things we assess about the string beyond its definition. I suspect each database/algorithm can provide its own context and we haven't explored all of them yet. So maybe all this context gets grouped together in an extensible way. File layout seems orthogonal to (static) strings and probably should be stored separately from the strings. A presentation layer could stitch together all the data and make it look pretty. |
Thanks for the review @mr-tz @williballenthin |
An alternative representation could then look like this: {
"strings": {
"static_strings": [
{
"id": 1
"encoding": "ascii",
"offset": 77,
"length": 40,
"string": "!This program cannot be run in DOS mode."
},
{
"id": 1337
"encoding": "ascii",
"offset": 11644,
"length": 12,
"string": "VirtualQuery"
},
{
"id": 9999
"encoding": "ascii",
"offset": 123456,
"length": 6,
"string": "unique"
},
]
"context":
{
1:
{
"structure": "pe.header",
"tags": [
"#common"
]
},
1337:
{
"structure": "import table",
"tags": [
"#winapi",
"#common"
]
}
# no 9999 entry
}
},
"file_layout": {
...
}
}
|
to store and exchange results we'll need a new output schema, likely json
the UI will render this data (or parts of it, when they become available although this should be quick)
again, likely an array of objects (combining all other keys from the databases?) should work here
The text was updated successfully, but these errors were encountered: