Coded Network Anomaly Detection application customized for Jupiter Orchestrator (available here: https://github.com/ANRGUSC/Jupiter).
The application task graph, shown below, is intended for dispersed computing. It is inspired from Hashdoop [1, 2], where a MapReduce framework is used for anomaly detection. We have modified the codes from [2] to suit our purpose.
Convert the pcap file to a text file using Ipsumdump as follows:
ipsumdump -tsSdDlpF -r botnet-capture-20110810-neris.pcap > botnet_summary.ipsum
Single input file, 1botnet.ipsum, is given in a repository.
Jupiter accepts pipelined computations described in a form of a Graph where the main task flow is represented as a Directed Acyclic Graph(DAG).
Thus, one should be able separate the graph into two pieces, the DAG part and non-DAG part.
Jupiter requires that each task in the DAG part of the graph to be written as a Python function in a separate file under the scripts
folder.
On the other hand the non-DAG tasks can be either Python function or a shell script with any number of arguments, located under the scripts
folder.
The folder structure is:
├── configuration.txt
├── DAG_directed.jpg
├── DAG.jpg
├── input_node.txt
├── LICENSE.txt
├── README.md
├── sample_input
│ ├── 1botnet.ipsum
│ └── 2botnet.ipsum
└── scripts
├── aggregate0.py
├── aggregate1.py
├── aggregate2.py
├── astutedetector0.py
├── astutedetector1.py
├── astutedetector2.py
├── config.json
├── dftdetector0.py
├── dftdetector1.py
├── dftdetector2.py
├── dftslave.py
├── fusioncenter0.py
├── fusioncenter1.py
├── fusioncenter2.py
├── globalfusion.py
├── localpro.py
├── masterServerStatic.py
├── parseFile.py
├── simpledetector0.py
├── simpledetector1.py
├── simpledetector2.py
├── teradetector0.py
├── teradetector1.py
├── teradetector2.py
└── terasort
├── master
│ ├── standby.sh
│ ├── ...
|
└── worker
└── rate-limit-then-standby.sh
This folder is required according to the Jupiter guidelines as well as for testing. One can leave it as an empty directory.
According to Jupiter guideline, there MUST be a config.json file with an
entry called taskname_map
to designate the DAG part and non DAG part of the task graph.
In this, each entry is represented as follows:
<task_name> <task_file> <DAG_flag> <Arguments>
<task_name>
is the name of the dag task such as simpledetector0
.
<task_file>
is the name of the file to be run internally
<DAG_flag
> represents whether the task is part of the DAG. If yes, set it to be True
otherwise set it to be False
.
<Arguments>
represents the arguments for the tasks that are not part of the DAG portion of the task-graph.
For example, as aggreagate0
is part of the DAG, the entry should be as follows.
"aggregate0" : ["aggregate0", true]
On the other hand, dftslave00
is not part of the DAG portion, then the entry should be as follows:
"dftslave00" : ["dftslave", false, 0, 0, 3]
Here the arguments 0,0,3
are required by the dftslave script and represents the masterid, slaveid, and max number of slaves, respectively.
To conform with Jupiter guidelines, there MUST be a configuration.txt
file representing the graph.
Each line of the file represents the children of each node.
To prepare the configuration file, first the task graph needs to be converted
to an arbitrary directed graph.
It can be done by running a Breadth First Search (BFS) type algorithm.
Then you get a graph something like:
Now each of the non-leaf nodes and the leaf nodes that are part of original DAG part are represented in the configuration files as
<node-name> <number for inputs required> <Flag stating whether to wait for all the inputs> <child1-name> <child2-name> ....
For all the other nodes (i.e, leaf nodes that are not part of the original DAG portion) such as DFTslave00
represent them as
<node-name> 1 False <node-name>
This file is required by the WAVE scheduler. This basically initiates the WAVE
scheduler's random mapping.
This file includes random mapping for the tasks that has no parent.
For example, the given input_node.txt
file randomly selects node4 for the localpro
task.
This folder contains all the executables related to the task graph.
Process the Ipsum file locally and split the traffic into multiple independent streams based on the hash value of the IP adresses.
SPLIT_ID (0, 1 or 2, in our case) uniquely idenfifies the split. This script aggregates traffic for a particular traffic split from different monitoring nodes.
A simple threshold based anomaly detector for the particular split.
An implementation of ASTUTE anomaly detector [3] from the repository [2].
An implementation of DFT detector (master) that communicates with DFT slave nodes to perform coded/uncoded Discrete Fourier transform analysis of the data.
A common implementation of the DFT slave detectors that is called for all the DFTslave## tasks.
This folder contains all the script required to run Terasort master that coordinates the coded/uncoded terasort analysis of the data among the Terasort workers.
This folder contains all the script required to run Terasort worker tasks.
As the terasort master and worker task nodes are not directly part of the DAG, we have added this extra task to connect them to the DAG portion of the graph. This script dispatches the input data to the teramaster<SPLIT_ID> for processing and collects the output of the analysis and pass it to the Fusion Center.
Combine the detected anomalies by different detectors for the particular split.
Collect all the anomalies from different splits and combine the detected anomalies.
This code is customized to be executed with Jupiter Only.
[1] Romain Fontugne, Johan Mazel, and Kensuke Fukuda. "Hashdoop: A mapreduce framework for network anomaly detection." Computer Communications Workshops (INFOCOM WORKSHOPS), IEEE Conference on. IEEE, 2014.
[2] Hashdoop GitHub Repository
[3] Fernando Silveira, Christophe Diot, Nina Taft, and Ramesh Govindan. "ASTUTE: Detecting a different class of traffic anomalies." ACM SIGCOMM Computer Communication Review 40.4 (2010): 267-278.
This material is based upon work supported by Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001117C0053. Any views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.