Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datajson v1.0 #12102

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 71 additions & 2 deletions doc/userguide/rules/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
Datasets
========

Using the ``dataset`` and ``datarep`` keyword it is possible to match on
large amounts of data against any sticky buffer.
Using the ``dataset`` and ``datarep`` and ``datajson`` keyword it is possible
to match on large amounts of data against any sticky buffer.

For example, to match against a DNS black list called ``dns-bl``::

Expand Down Expand Up @@ -145,6 +145,26 @@ reputation lists. A MD5 list, a SHA256 list, and a raw string (buffer) list.
The rules will only match if the data is in the list and the reputation
value is higher than 200.

datajson
~~~~~~~~

DataJSON allows matching data against a set and output data attached to the matching
value in the event.

Syntax::

datajson:<cmd>,<name>,<options>;

datajson:<isset|isnotset>,<name> \
[, type <string|md5|sha256|ipv4|ip>, load <file name>, memcap <size>, hashsize <size>, key <json_key>];

Example rules could look like::

alert http any any -> any any (msg:"IP match"; ip.dst; datajson:isset,bad_ips, type ip, load bad_ips.csv, key bad_ones; sid:8000001;)

In this example, the match will occur if the destination IP is in the set and the
alert will have an ``alert.extra.bad_ones`` subobject that will contain the JSON
data associated to the value.

Rule Reloads
------------
Expand Down Expand Up @@ -243,6 +263,44 @@ Syntax::

dataset-dump

datajson-add
~~~~~~~~~~~~

Unix Socket command to add data to a set. On success, the addition becomes
active instantly.

Syntax::

datajson-add <set name> <set type> <data> <json_info>

set name
Name of an already defined dataset
type
Data type: string, md5, sha256, ipv4, ip
data
Data to add in serialized form (base64 for string, hex notation for md5/sha256, string representation for ipv4/ip)

Example adding 'google.com' to set 'myset'::

datajson-add myset string Z29vZ2xlLmNvbQ== {"city":"Mountain View"}

datajson-remove
~~~~~~~~~~~~~~~

Unix Socket command to remove data from a set. On success, the removal becomes
active instantly.

Syntax::

datajson-remove <set name> <set type> <data>

set name
Name of an already defined dataset
type
Data type: string, md5, sha256, ipv4, ip
data
Data to remove in serialized form (base64 for string, hex notation for md5/sha256, string representation for ipv4/ip)

File formats
------------

Expand Down Expand Up @@ -292,6 +350,17 @@ Syntax::

<data>,<value>


datajson
~~~~~~~~

The datajson format follows the dataset, expect that there are 1 more CSV
regit marked this conversation as resolved.
Show resolved Hide resolved
field:

Syntax::

<data>,<json_data>
regit marked this conversation as resolved.
Show resolved Hide resolved

.. _datasets_file_locations:

File Locations
Expand Down
4 changes: 4 additions & 0 deletions etc/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,10 @@
"xff": {
"type": "string"
},
"extra": {
"type": "object",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to get a description on everything new for documentation purposes, can you add one?

"additionalProperties": true
},
"metadata": {
"type": "object",
"properties": {
Expand Down
32 changes: 32 additions & 0 deletions python/suricata/sc/specs.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,38 @@
"required": 1,
},
],
"datajson-add": [
{
"name": "setname",
"required": 1,
},
{
"name": "settype",
"required": 1,
},
{
"name": "datavalue",
"required": 1,
},
{
"name": "datajson",
"required": 1,
},
],
"datajson-remove": [
{
"name": "setname",
"required": 1,
},
{
"name": "settype",
"required": 1,
},
{
"name": "datavalue",
"required": 1,
},
],
"get-flow-stats-by-id": [
{
"name": "flow_id",
Expand Down
7 changes: 7 additions & 0 deletions python/suricata/sc/suricatasc.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,8 @@ def __init__(self, sck_path, verbose=False):
"memcap-show",
"dataset-add",
"dataset-remove",
"datajson-add",
"datajson-remove",
"get-flow-stats-by-id",
"dataset-clear",
"dataset-lookup",
Expand Down Expand Up @@ -218,6 +220,11 @@ def execute(self, command):
cmd_specs = argsd[cmd]
required_args_count = len([d["required"] for d in cmd_specs if d["required"] and not "val" in d])
arguments = dict()
# if all arguments are required in the command then we split at the count
# this way we can handle last argument containing space (datajson-add for example)
non_req_args_count = len([d for d in cmd_specs if not d["required"] or "val" in d])
if non_req_args_count == 0:
full_cmd = command.split(maxsplit=required_args_count)
for c, spec in enumerate(cmd_specs, 1):
spec_type = str if "type" not in spec else spec["type"]
if spec["required"]:
Expand Down
3 changes: 3 additions & 0 deletions src/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ noinst_HEADERS = \
datasets.h \
datasets-ipv4.h \
datasets-ipv6.h \
datasets-json.h \
datasets-md5.h \
datasets-reputation.h \
datasets-sha256.h \
Expand Down Expand Up @@ -102,6 +103,7 @@ noinst_HEADERS = \
detect-config.h \
detect-content.h \
detect-csum.h \
detect-datajson.h \
detect-datarep.h \
detect-dataset.h \
detect-dce-iface.h \
Expand Down Expand Up @@ -662,6 +664,7 @@ libsuricata_c_a_SOURCES = \
detect-config.c \
detect-content.c \
detect-csum.c \
detect-datajson.c \
detect-datarep.c \
detect-dataset.c \
detect-dce-iface.c \
Expand Down
34 changes: 34 additions & 0 deletions src/datasets-ipv4.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,37 @@ uint32_t IPv4Hash(uint32_t hash_seed, void *s)
void IPv4Free(void *s)
{
}

int IPv4JsonSet(void *dst, void *src)
{
IPv4TypeJson *src_s = src;
IPv4TypeJson *dst_s = dst;
memcpy(dst_s->ipv4, src_s->ipv4, sizeof(dst_s->ipv4));
dst_s->json.value = src_s->json.value;
dst_s->json.len = src_s->json.len;

return 0;
}

bool IPv4JsonCompare(void *a, void *b)
{
const IPv4TypeJson *as = a;
const IPv4TypeJson *bs = b;

return (memcmp(as->ipv4, bs->ipv4, sizeof(as->ipv4)) == 0);
}

uint32_t IPv4JsonHash(uint32_t hash_seed, void *s)
{
const IPv4TypeJson *str = s;
return hashword((uint32_t *)str->ipv4, 1, hash_seed);
}

// data stays in hash
void IPv4JsonFree(void *s)
{
const IPv4TypeJson *as = s;
if (as->json.value) {
SCFree(as->json.value);
}
}
11 changes: 11 additions & 0 deletions src/datasets-ipv4.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,26 @@
#define SURICATA_DATASETS_IPV4_H

#include "datasets-reputation.h"
#include "datasets-json.h"

typedef struct IPv4Type {
uint8_t ipv4[4];
DataRepType rep;
} IPv4Type;

typedef struct IPv4TypeJson {
uint8_t ipv4[4];
DataJsonType json;
} IPv4TypeJson;

int IPv4Set(void *dst, void *src);
bool IPv4Compare(void *a, void *b);
uint32_t IPv4Hash(uint32_t hash_seed, void *s);
void IPv4Free(void *s);

int IPv4JsonSet(void *dst, void *src);
bool IPv4JsonCompare(void *a, void *b);
uint32_t IPv4JsonHash(uint32_t hash_seed, void *s);
void IPv4JsonFree(void *s);

#endif /* SURICATA_DATASETS_IPV4_H */
33 changes: 33 additions & 0 deletions src/datasets-ipv6.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,36 @@ uint32_t IPv6Hash(uint32_t hash_seed, void *s)
void IPv6Free(void *s)
{
}

int IPv6JsonSet(void *dst, void *src)
{
IPv6TypeJson *src_s = src;
IPv6TypeJson *dst_s = dst;
memcpy(dst_s->ipv6, src_s->ipv6, sizeof(dst_s->ipv6));
dst_s->json.value = src_s->json.value;
dst_s->json.len = src_s->json.len;

return 0;
}

bool IPv6JsonCompare(void *a, void *b)
{
const IPv6TypeJson *as = a;
const IPv6TypeJson *bs = b;

return (memcmp(as->ipv6, bs->ipv6, sizeof(as->ipv6)) == 0);
}

uint32_t IPv6JsonHash(uint32_t hash_seed, void *s)
{
const IPv6TypeJson *str = s;
return hashword((uint32_t *)str->ipv6, 4, hash_seed);
}

void IPv6JsonFree(void *s)
{
const IPv6TypeJson *as = s;
if (as->json.value) {
SCFree(as->json.value);
}
}
10 changes: 10 additions & 0 deletions src/datasets-ipv6.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,19 @@ typedef struct IPv6Type {
DataRepType rep;
} IPv6Type;

typedef struct IPv6TypeJson {
uint8_t ipv6[16];
DataJsonType json;
} IPv6TypeJson;

int IPv6Set(void *dst, void *src);
bool IPv6Compare(void *a, void *b);
uint32_t IPv6Hash(uint32_t hash_seed, void *s);
void IPv6Free(void *s);

int IPv6JsonSet(void *dst, void *src);
bool IPv6JsonCompare(void *a, void *b);
uint32_t IPv6JsonHash(uint32_t hash_seed, void *s);
void IPv6JsonFree(void *s);

#endif /* __DATASETS_IPV4_H__ */
38 changes: 38 additions & 0 deletions src/datasets-json.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/* Copyright (C) 2024 Open Information Security Foundation
*
* You can copy, redistribute or modify this Program under the terms of
* the GNU General Public License version 2 as published by the Free
* Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
* 02110-1301, USA.
*/

/**
* \file
*
* \author Eric Leblond <[email protected]>
*/

#ifndef __DATASETS_JSON_H__
#define __DATASETS_JSON_H__

#include <stdint.h>
regit marked this conversation as resolved.
Show resolved Hide resolved
typedef struct DataJsonType {
char *value;
size_t len;
} DataJsonType;

typedef struct DataJsonResultType {
bool found;
DataJsonType json;
} DataJsonResultType;

#endif /* __DATASETS_JSON_H__ */
33 changes: 33 additions & 0 deletions src/datasets-md5.c
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,36 @@ uint32_t Md5StrHash(uint32_t hash_seed, void *s)
void Md5StrFree(void *s)
{
}

int Md5StrJsonSet(void *dst, void *src)
{
Md5TypeJson *src_s = src;
Md5TypeJson *dst_s = dst;
memcpy(dst_s->md5, src_s->md5, sizeof(dst_s->md5));
dst_s->json.value = src_s->json.value;
dst_s->json.len = src_s->json.len;
return 0;
}

bool Md5StrJsonCompare(void *a, void *b)
{
const Md5TypeJson *as = a;
const Md5TypeJson *bs = b;

return (memcmp(as->md5, bs->md5, sizeof(as->md5)) == 0);
}

uint32_t Md5StrJsonHash(uint32_t hash_seed, void *s)
{
const Md5TypeJson *str = s;
return hashword((uint32_t *)str->md5, sizeof(str->md5) / 4, hash_seed);
}

// data stays in hash
void Md5StrJsonFree(void *s)
{
const Md5TypeJson *as = s;
if (as->json.value) {
SCFree(as->json.value);
}
}
Loading
Loading