-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datajson v1.0 #12102
Datajson v1.0 #12102
Conversation
This patch introduces a new keyword datajson that is similar to dataset with a twist. Where dataset allows match from sets, datajson allows the same but also adds JSON data to the alert event. This data is comint from the set definition it self. For example, an ipv4 set will look like: 10.16.1.11,{"test": "success","context":3} The syntax is value and json data separated by a comma. The syntax of the keyword is the following: datajson:isset,src_ip,type ip,load src.lst,key src_ip; Compare to dataset, it just have a supplementary option key that is used to indicate in which subobject the JSON value should be added. The information is added in the even under the alert.extra subobject: "alert": { "extra": { "src_ip": { "test": "success", "context": 3 }, The main interest of the feature is to be able to contextualize a match. For example, if you have an IOC source, you can do value1,{"actor":"APT28","Country":"FR"} value2,{"actor":"APT32","Country":"NL"} This way, a single dataset is able to produce context to the event where it was not possible before and multiple signatures had to be used. Ticket: OISF#7372
Previous code was using an array and introducing a limit in the number of datajson keywords that can be used in a signature. This patch uses a linked list instead to overcome the limit. By using a first element of the list that is part of the structure we limit the cost of the feature to a structure member added to PacketAlert structure. Only the PacketAlertFree function is impacted as we need to iterate to find potential allocation. Ticket: OISF#7372
It was not handling correctly the json values with space as they were seen as multiple arguments. Ticket: OISF#7372
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12102 +/- ##
===========================================
- Coverage 83.23% 67.43% -15.81%
===========================================
Files 906 846 -60
Lines 257647 156207 -101440
===========================================
- Hits 214458 105332 -109126
- Misses 43189 50875 +7686
Flags with carried forward coverage won't be shown. Click here to find out more. |
Information: ERROR: QA failed on SURI_TLPW2_autofp_suri_time.
Pipeline 23289 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments.
Bigger issue: I like this work, but I would like to understand why this needs a new set type and keyword type? Can we not overload the existing dataset/datarep facilities?
if (set == NULL) | ||
return rrep; | ||
|
||
if (data_len != 16 && data_len != 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would we get here with data_len == 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetDataDst and GetDataSrc in detect-ipaddr.c are setting a buffer length of 4 or 16 depending of IP type:
static InspectionBuffer *GetDataDst(DetectEngineThreadCtx *det_ctx,
const DetectEngineTransforms *transforms, Packet *p, const int list_id)
{
InspectionBuffer *buffer = InspectionBufferGet(det_ctx, list_id);
if (buffer->inspect == NULL) {
if (PacketIsIPv4(p)) {
/* Suricata stores the IPv4 at the beginning of the field */
InspectionBufferSetup(det_ctx, list_id, buffer, p->dst.address.address_un_data8, 4);
} else if (PacketIsIPv6(p)) {
InspectionBufferSetup(det_ctx, list_id, buffer, p->dst.address.address_un_data8, 16);
so we can have the 2 lengths. As Suricata stores the IPv4 at the beginning of the field, this is working.
Information: ERROR: QA failed on SURI_TLPW2_autofp_suri_time.
Pipeline 23316 |
Did you forget to use OISF/suricata-verify#2123 here ? |
@@ -212,6 +212,10 @@ | |||
"xff": { | |||
"type": "string" | |||
}, | |||
"extra": { | |||
"type": "object", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to get a description on everything new for documentation purposes, can you add one?
I'm a little hung up on the "extra" object? Is this data much different than metadata? For example, if we were to spin a rule for each entry in dataset, we'd add extra context in the message, or metadata, here we're just adding extra metadata from the dataset. Also, with "extra" being a very generic variable name, should it be available to possible future items that might want to add extra metadata? For example, a Lua rule that wants to add extra metadata? Or should this be restricted to datajson and such be given a more specific name? |
Is there a real need for a new dataset type of |
I will admit I was kinda expecting to have a discussion on the used keyword. I'm not super happy with the choice but I did not find a better one. I think we need to avoid conflict, people parsing metadata of signatures are kind of expecting to just have that in
The concept of extracting information at detection time seems pretty good. Flowbits is currently used for that (lua, pcre extraction) but it is then propagated to all events in the flowbits object. It may be better to offer a per alert way of doing this. Maybe we can stay with |
I think it is more on educational side. Potentially also a bit keeping the code simple. On the educational/usage side, it allows signature readers to understand really fast what is really done. datajson keyword is rather explicit and different from dataset so user can understand that there will be consequences. |
Can you explain how its different than a dataset that contains additional metadata to attach to the alert? Its almost like |
Currently, what the code is doing is:
So it is really a datajson as we don't encode and reencode at output. |
Closing in favor of #12175 |
Indicator of Compromises (IOCs) are a key element in Security Operating Center. Dataset
have been a huge step in getting alert on IOCs from Suricata. But produced alerts are
lacking contextualization. For example, if a IOC management software has a list of host
names, they will be linked to different threat actors but at ingestion in Suricata they will be
a simple list of strings. This list will be used via a rule like
With this an alert will have a subject without information and a mapping will have to be done
at posteriori to see which IOC has hit. The pseudo algorithm to run is:
This works but it is not optimal as correlation and external processing has to be done for all the matches.
To fix this issue, we need to be able to ingest the IOCs without loosing the contextual
information contained in the IOC management software.
Datajson is a proposed implementation that addresses this issue. Instead of injecting into
Suricata the value list we can attach to each value a JSON object that will end up into
the alert output.
The following example is alert on source and destination IP in dataset:
In ip4-json.lst, we have data from inventory:
In bad-json.lst, we have data from the IOC management software:
The result is an alert section that looks like:
Contribution style:
https://docs.suricata.io/en/latest/devguide/contributing/contribution-process.html
Our Contribution agreements:
https://suricata.io/about/contribution-agreement/ (note: this is only required once)
Changes (if applicable):
(including schema descriptions)
https://redmine.openinfosecfoundation.org/projects/suricata/issues
Link to ticket: https://redmine.openinfosecfoundation.org/issues/7372
Describe changes:
SV_REPO=https://github.com/regit/suricata-verify/
SV_BRANCH=datajson-v1.0