-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more helper functions to inspection #37
Comments
The inspector gives you a lot of information, including:
Here is an example pattern, modified slightly from one of the test cases:
The inspector returns a 3-tuple consisting of comparison data, observation operators, and qualifiers. The most complex data is the comparison data. Below is the comparison data which would be currently be returned for the above pattern: {'bar': [(['foo'], 'NOT >', '33')],
'baz': [(['bar'], 'ISSUBSET', "'1234'"), (['quux'], 'NOT LIKE', "'a_cd'")],
'foo': [(['bar', 'xyz'], '=', '1')]} The outermost structure is: The structure of each element of each list is: Observation operators are given as a set of strings. Each operator in use is included: {'OR', 'FOLLOWEDBY'} Finally, each qualifier in use is also given as a set of strings: {'REPEATS 12 TIMES'} So I think the information described in your first bullet is there. The information from the second bullet might be there, depending on what you're looking for. "Observation expression" encompasses a lot of inner structure. I think the components of that structure are basically there. Here is a usage example I wrote, for a different task. It is intended to determine whether a pattern consists of a single ipv4 address equality comparison: def is_ipv4_equals_pattern(pattern):
"""
Determines whether the given pattern is of the form
[ipv4:value = <some_address>]
"""
results = pattern.inspect()
return not results.observation_ops \
and not results.qualifiers \
and len(results.comparisons) == 1 \
and "ipv4-addr" in results.comparisons \
and len(results.comparisons["ipv4-addr"]) == 1 \
and results.comparisons["ipv4-addr"][0][0] == ["value"] \
and results.comparisons["ipv4-addr"][0][1] in ("=", "==") The things it checks are, in order:
So that's a lot of checks, but I think it shows that you can get a lot of specificity from the data provided. You could probably think of even more that could be included, but I think the more structure and complexity there is, the more complicated the resulting checks can be. I just wanted to determine whether the pattern was a simple equality comparison of an IP address, but that took seven separate tests. It could grow even larger. A balance probably needs to be struck, between flexibility and usability. Does this address your needs? |
Thanks. Yes, I saw that the inspector can provide that structure already. This issue was kind of about the opposite...not the information that you can get, but how easy it is to get and pull it out. For example, to get the list of objects you would collect the unique set of keys in the dictionary. To get the operators you would get the (unique) second value of the tuple for each of the values. That all is fine and workable, but it seemed to me like something that many of the users of the library would want...so if those convenience functions were provided as part of the library it would save everyone from having to rewrite them. E.g. if the inspect function returned an intermediate object you could call functions like |
@infosec-alchemist and I were talking about this before I saw this issue. I agree we need to figure out how the data from the inspector is likely to be used, and make sure that data is easily accessible. |
I created a function in pattern.py which will return a json object containing a list of the objects and properties that are part of the pattern. I'm basically taking the pattern object, and extracting parts. This is needed as part of Unfetter, making the call. However, maybe I should have a different python program that takes the Pattern object and formats it for the outside program. Letting Pattern.py be more of the workhorse. I think thats more about how you want to architect your code interaction with other programs. |
@johnwunder @infosec-alchemist, just wanted to bump this issue. Is there anything that's needed from this library (for the pattern translator perhaps)? |
I removed this from the 1.0 milestone, since we're trying to get a 1.0 version out pretty soon, and I'm still not sure what additional helper functions would be useful. |
I've been working on a pattern expression parser at Perch (@usePF) that spits out a dict-based tree representation of a Pattern. I figure a tool/language is only as effective as one's ability to troubleshoot/debug it, so along with the dict-based tree is a YAML-based DSL for human consumption. The working title has been "Pattern Tree". I'd like to open-source the thing, and had a hunch it might fit in this repo. I can work on a PR in my spare time if y'all agree. So, I, uhhhhh... might've written something resembling an entire spec for the thing... but sparing y'all that, here's an excerpt from the original PR. If you really wanna subject yourself to the torture, I can reproduce the spec :P ExcerptThis PR proposes handling STIX2 Pattern Expressions with a new class, from intel.pattern import Pattern
pattern = Pattern("[domain-name:value = 'http://xyz.com/download']")
assert pattern.to_dict_tree() == {
'pattern': {
'observation': {
'objects': {'domain-name'},
'join': None,
'qualifiers': None,
'expressions': [
{'comparison': {
'object': 'domain-name',
'path': ['value'],
'negated': None,
'operator': '=',
'value': 'http://xyz.com/download',
}}
]
}
}
} A specialized YAML representation is also proposed, to make visualization of this data a little less cumbersome: from intel.pattern import Pattern
pattern = Pattern("[domain-name:value = 'http://xyz.com/download']")
assert str(pattern.to_dict_tree()) == '''\
pattern:
observation:
objects: {domain-name}
join:
qualifiers:
expressions:
- comparison:
object: domain-name
path: [value]
negated:
operator: '='
value: http://xyz.com/download
''' |
I am unclear on your goal for this: is it intended to represent the complete original pattern semantics, or just some selected details (like the pattern inspector which is the subject of this issue)? The name "Pattern Tree" makes me think of an AST, which is intended to capture full semantics, but it's not clear from these examples that your trees do that. |
Aye, the intention is to retain the complete original pattern semantics. After reading your comment, I figured the pattern inspector is doing exactly what it's meant to, so I've packaged this PatternTree thing as a standalone library: https://github.com/usePF/dendrol |
Ok. Fyi, some AST functionality was written, although it is currently not centralized in one place. There are AST node classes in the stix2 project, but the only AST building code is in the slider as far as I know, since I guess that's the only place an AST has been needed so far. We have talked about other uses for it, e.g. using pattern structure to determine semantic equivalence. Maybe if there were enough of a need for it outside of the slider, the AST builder would receive a "promotion" of some sort, to the main stix2 project. |
Ah, maybe "the intention" is too strong of a posit — it's one of the intentions, for sure; the real impetus was wanting to extract from a Pattern expression more than just what was in it, but how those things fit together. I may've missed some functionality (no, I definitely missed a bunch of shit) while searching the OASIS STIX libraries, but if nonstandard evaluation of expressions is desired (like, if the whole STIX2 Observed Data bundle isn't available at once to be matched against), it seemed there were only two options: pass an ANTLR Listener to Not many devs understand how to work with ANTLR, and STIX2 Patterns are too fuckin cool to hide behind that wall. Working with STIX2 objects and relationships between them feels pretty natural in Python, but the Pattern expressions have felt pretty opaque. As a dev, one might encounter Patterns in Indicators, see the possibilities, and want them to be more than just strings. One might naturally end up in this repo, thinking If the community at large is to do cool shit with STIX2, like I'm excited to do, I can't ask them to understand formal grammars or ANTLR. That's the impetus of the tool, and why I thought it made sense here. (reading all that back, I'm realizing the magnitude of my passion. I won't ask you to forgive it, but I acknowledge my bluntness) |
It would be nice to get summary stats about a pattern:
This would help you determine whether a pattern was parsable or usable by your tool.
The text was updated successfully, but these errors were encountered: