-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Datajson v5.2 #12863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Datajson v5.2 #12863
Changes from all commits
4651d8b
b864c5b
568c8a7
4386fe5
3017bec
9251564
d506f83
4253775
db2c0ab
8f315d6
d510a31
37f45a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,8 +3,8 @@ | |
Datasets | ||
======== | ||
|
||
Using the ``dataset`` and ``datarep`` keyword it is possible to match on | ||
large amounts of data against any sticky buffer. | ||
Using the ``dataset`` and ``datarep`` keyword it is possible | ||
to match on large amounts of data against any sticky buffer. | ||
|
||
For example, to match against a DNS black list called ``dns-bl``:: | ||
|
||
|
@@ -79,7 +79,9 @@ Syntax:: | |
dataset:<cmd>,<name>,<options>; | ||
|
||
dataset:<set|unset|isset|isnotset>,<name> \ | ||
[, type <string|md5|sha256|ipv4|ip>, save <file name>, load <file name>, state <file name>, memcap <size>, hashsize <size>]; | ||
[, type <string|md5|sha256|ipv4|ip>, save <file name>, load <file name>, state <file name>, memcap <size>, hashsize <size> | ||
, format <csv|json|jsonline>, enrichment_key <output_key>, value_key <json_key>, array_key <json_path>, | ||
remove_key]; | ||
|
||
type <type> | ||
the data type: string, md5, sha256, ipv4, ip | ||
|
@@ -94,6 +96,23 @@ memcap <size> | |
maximum memory limit for the respective dataset | ||
hashsize <size> | ||
allowed size of the hash for the respective dataset | ||
format <type> | ||
the format of the file: csv, json. Defaut to csv. See | ||
:ref:`dataset with json format <datasets_json>` for json | ||
and jsonline option | ||
enrichment_key <key> | ||
the key to use for the enrichment of the alert event | ||
for json format | ||
value_key <key> | ||
the key to use for the value of the alert | ||
for json format | ||
array_key <key> | ||
the key to use for the array of the alert | ||
for json format | ||
remove_key | ||
if set, the JSON object pointed by value key will be removed | ||
from the alert event | ||
|
||
|
||
.. note:: 'type' is mandatory and needs to be set. | ||
|
||
|
@@ -146,6 +165,47 @@ The rules will only match if the data is in the list and the reputation | |
value is higher than 200. | ||
|
||
|
||
.. _datasets_json: | ||
|
||
dataset with json | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
DataJSON allows matching data against a set and output data attached to the matching | ||
value in the event. | ||
|
||
There is two format supported: ``json`` and ``jsonline``. The difference is that | ||
``json`` format is a single JSON object, while ``jsonline`` is handling file with | ||
one JSON object per line. The ``jsonline`` format is useful for large files | ||
as the parsing is done line by line. | ||
|
||
Syntax:: | ||
|
||
dataset:<cmd>,<name>,<options>; | ||
|
||
dataset:<isset|isnotset>,<name> \ | ||
[, type <string|md5|sha256|ipv4|ip>, load <file name>, format <json|jsonline>, memcap <size>, hashsize <size>, enrichment_key <json_key> \ | ||
, value_key <json_key>, array_key <json_path>]; | ||
|
||
Example rules could look like:: | ||
|
||
alert http any any -> any any (msg:"IP match"; ip.dst; dataset:isset,bad_ips, type ip, load bad_ips.json, format json, enrichment_key bad_ones, value_key ip; sid:8000001;) | ||
|
||
In this example, the match will occur if the destination IP is in the set and the | ||
alert will have an ``alert.extra.bad_ones`` subobject that will contain the JSON | ||
data associated to the value (``bad_ones`` coming from ``enrichment_key`` option). | ||
|
||
When format is ``json`` or ``jsonline``, the ``value_key`` is used to get | ||
the value in the line (``jsonline`` format) or in the array (``json`` format). | ||
At least one single element needs to be have the ``value_key`` present in the data file to | ||
have a successful load. | ||
If ``array_key`` is present, Suricata will extract the corresponding subobject that has to be | ||
a JSON array and search for element to add to the set in this array. This is only valid for ``json`` format. | ||
|
||
If you don't want to have the ``value_key`` in the alert, you can use the | ||
``remove_key`` option. This will remove the key from the alert event. | ||
|
||
See :ref:`Datajson format <datajson_data>` for more information. | ||
|
||
Rule Reloads | ||
------------ | ||
|
||
|
@@ -243,6 +303,28 @@ Syntax:: | |
|
||
dataset-dump | ||
|
||
dataset-add-json | ||
~~~~~~~~~~~~~~~~ | ||
|
||
Unix Socket command to add data to a set. On success, the addition becomes | ||
active instantly. | ||
|
||
Syntax:: | ||
|
||
dataset-add-json <set name> <set type> <data> <json_info> | ||
|
||
set name | ||
Name of an already defined dataset | ||
type | ||
Data type: string, md5, sha256, ipv4, ip | ||
data | ||
Data to add in serialized form (base64 for string, hex notation for md5/sha256, string representation for ipv4/ip) | ||
|
||
Example adding 'google.com' to set 'myset':: | ||
|
||
dataset-add-json myset string Z29vZ2xlLmNvbQ== {"city":"Mountain View"} | ||
|
||
|
||
File formats | ||
------------ | ||
|
||
|
@@ -285,13 +367,41 @@ which when piped to ``base64 -d`` reveals its value:: | |
datarep | ||
~~~~~~~ | ||
|
||
The datarep format follows the dataset, expect that there are 1 more CSV | ||
The datarep format follows the dataset, except that there are 1 more CSV | ||
field: | ||
|
||
Syntax:: | ||
|
||
<data>,<value> | ||
|
||
.. _datajson_data: | ||
|
||
dataset with JSON enrichment | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
If ``format json`` is used in the parameters of a dataset keyword, then the loaded | ||
file has to contain a valid JSON object. | ||
|
||
If ``value_key``` option is present then the file has to contain a valid JSON | ||
object containing an array where the key equal to ``value_key`` value is present. | ||
|
||
For example, if the file ``file.json`` is like the following example (typical of return of REST API call) :: | ||
|
||
{ | ||
"time": "2024-12-21", | ||
"response": { | ||
"threats": | ||
[ | ||
{"host": "toto.com", "origin": "japan"}, | ||
{"host": "grenouille.com", "origin": "french"} | ||
] | ||
} | ||
} | ||
|
||
then the match to check the list of threats using datajson can be defined as :: | ||
|
||
http.host; dataset:isset,threats,load file.json, enrichment_key threat, value_key host, array_key response.threats; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this matches on hosts like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see the confusion here,
I'm adding this example to updated doc. |
||
|
||
.. _datasets_file_locations: | ||
|
||
File Locations | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,12 +71,11 @@ impl<'a> CommandParser<'a> { | |
} | ||
|
||
pub fn parse(&self, input: &str) -> Result<serde_json::Value, CommandParseError> { | ||
let parts: Vec<&str> = input.split(' ').map(|s| s.trim()).collect(); | ||
let mut parts: Vec<&str> = input.split(' ').map(|s| s.trim()).collect(); | ||
if parts.is_empty() { | ||
return Err(CommandParseError::Other("No command provided".to_string())); | ||
} | ||
let command = parts[0]; | ||
let args = &parts[1..]; | ||
|
||
let spec = self | ||
.commands | ||
|
@@ -91,6 +90,13 @@ impl<'a> CommandParser<'a> { | |
|
||
// Calculate the number of required arguments for better error reporting. | ||
let required = spec.iter().filter(|e| e.required).count(); | ||
let optional = spec.iter().filter(|e| !e.required).count(); | ||
// Handle the case where the command has only required arguments and allow | ||
// last one to contain spaces. | ||
if optional == 0 { | ||
parts = input.splitn(required + 1, ' ').collect(); | ||
} | ||
let args = &parts[1..]; | ||
|
||
let mut json_args = HashMap::new(); | ||
|
||
|
@@ -386,6 +392,28 @@ fn command_defs() -> Result<HashMap<String, Vec<Argument>>, serde_json::Error> { | |
"type": "string", | ||
}, | ||
], | ||
"dataset-add-json": [ | ||
{ | ||
"name": "setname", | ||
"required": true, | ||
"type": "string", | ||
}, | ||
{ | ||
"name": "settype", | ||
"required": true, | ||
"type": "string", | ||
}, | ||
{ | ||
"name": "datavalue", | ||
"required": true, | ||
"type": "string", | ||
}, | ||
{ | ||
"name": "datajson", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. datajson There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we have the option datavalue just before and this part is the json so naming is ok I think. |
||
"required": true, | ||
"type": "string", | ||
}, | ||
], | ||
"get-flow-stats-by-id": [ | ||
{ | ||
"name": "flow_id", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some threat intel software such as MISP are producing a list of IOCs with context in a reply to a REST API call. So with them we can use
json
format for that. But if scripting is involved to crunch the data thejsonline
format can be convenient to use as it requires less memory to generate.