This project is a Python-based Advanced Persistent Threat (APT) detection system that uses the Hybrid HHOSSA optimization technique for feature selection and data balancing. It integrates LightGBM and Bi-LSTM models for classification and provides a real-time detection system with a monitoring dashboard.
- Project Overview
- Setup Instructions
- Usage
- File Structure
- Integrating Real-Time Data Sources for APT Detection
- Contributing
- License
This APT detection system consists of the following components:
- Data Preprocessing: Load and clean the dataset, and extract features.
- Feature Selection: Select significant features using the HHOSSA technique.
- Data Balancing: Balance the dataset using HHOSSA-SMOTE.
- Model Training: Train LightGBM and Bi-LSTM models.
- Model Evaluation: Evaluate models using accuracy and ROC-AUC.
- Real-Time Detection: Ingest real-time data using Kafka.
- Monitoring Dashboard: Visualize data and model performance using Flask and Plotly.
- Python 3.8 or higher
- Java Development Kit (JDK) 11 or higher
- Kafka
- Zookeeper
-
Clone the Repository
git clone https://github.com/Ap6pack/APT-Detection-System.git cd APT-Detection-System
-
Create and Activate a Virtual Environment
python3 -m venv venv source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
-
Install Java (if not already installed)
sudo apt update sudo apt install openjdk-11-jdk java -version
sudo yum install java-11-openjdk-devel java -version
brew update brew install openjdk@11 echo 'export PATH="/usr/local/opt/openjdk@11/bin:$PATH"' >> ~/.zshrc echo 'export JAVA_HOME=$(/usr/libexec/java_home -v 11)' >> ~/.zshrc source ~/.zshrc java -version
-
Download Kafka
Download Kafka from the official Apache website.
-
Start Zookeeper and Kafka
# Start Zookeeper kafka_2.13-3.8.0/bin/zookeeper-server-start.sh kafka_2.13-3.8.0/config/zookeeper.properties # Start Kafka (in a new terminal) kafka_2.13-3.8.0/bin/kafka-server-start.sh kafka_2.13-3.8.0/config/server.properties
-
Create Kafka Topic
kafka_2.13-3.8.0/bin/kafka-topics.sh --create --topic apt_topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
Create a new file produce_messages.py
:
from kafka import KafkaProducer
def produce_messages():
producer = KafkaProducer(bootstrap_servers='localhost:9092')
for i in range(10):
message = f"Message {i}"
producer.send('apt_topic', value=message.encode('utf-8'))
print(f"Sent: {message}")
producer.flush()
if __name__ == "__main__":
produce_messages()
Run the script to send messages to the Kafka topic:
python3 produce_messages.py
python3 main.py
Open your web browser and go to http://127.0.0.1:5000/
to view the dashboard.
APT_Detection_System/
├── dashboard/
│ ├── __init__.py
│ ├── app.py
│ └── templates/
│ └── index.html
├── data_preprocessing/
│ ├── __init__.py
│ ├── preprocess.py
│ ├── data_cleaning.py
│ └── feature_engineering.py
├── evaluation/
│ ├── __init__.py
│ ├── evaluation_metrics.py
│ └── cross_validation.py
├── feature_selection/
│ ├── __init__.py
│ └── hhosssa_feature_selection.py
├── data_balancing/
│ ├── __init__.py
│ └── hhosssa_smote.py
├── models/
│ ├── __init__.py
│ ├── train_models.py
│ ├── lightgbm_model.py
│ ├── bilstm_model.py
│ └── hybrid_classifier.py
├── real_time_detection/
│ ├── __init__.py
│ ├── data_ingestion.py
│ └── prediction_engine.py
├── visualization.py
├── main.py
├── produce_messages.py
└── requirements.txt
This guide provides detailed configurations for integrating various real-time data sources to enhance Advanced Persistent Threat (APT) detection capabilities.
- Security Information and Event Management (SIEM)
- Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS)
- Endpoint Detection and Response (EDR)
- Threat Intelligence Feeds
- Network Traffic Analysis
-
Install Splunk Forwarder:
wget -O splunkforwarder.tgz "https://download.splunk.com/products/universalforwarder/releases/8.1.3/linux/splunkforwarder-8.1.3-aeae3fe429ae-Linux-x86_64.tgz" tar -xvf splunkforwarder.tgz ./splunkforwarder/bin/splunk start --accept-license ./splunkforwarder/bin/splunk enable boot-start
-
Configure Data Inputs:
- Add data inputs through Splunk Web UI (Settings > Data Inputs).
- Monitor specific directories, files, or network ports.
-
Create Dashboards and Alerts:
- Use the Splunk Search Processing Language (SPL) to create queries.
- Build dashboards in the Splunk Web UI (Dashboard > Create New Dashboard).
- Set up alerts (Alerts > Create New Alert) based on query results.
-
Install Snort:
sudo apt-get install snort
-
Configure snort.conf:
- Edit
/etc/snort/snort.conf
to set HOME_NET and EXTERNAL_NET. - Define rule paths:
var RULE_PATH /etc/snort/rules
. - Set output plugins for logging.
- Edit
-
Update Rules:
- Download rules from Snort.org or subscribe to rule updates.
- Place rule files in
/etc/snort/rules
.
-
Start Snort:
sudo snort -c /etc/snort/snort.conf -i eth0
-
Log Management:
- Configure Snort to log to a centralized log management system like Splunk or ELK Stack.
-
Deploy Falcon Agent:
- Obtain the Falcon installer from the CrowdStrike portal.
- Install on endpoints:
sudo apt-get install falcon-sensor sudo systemctl start falconsensor sudo systemctl enable falconsensor
-
Configure Policies:
- In the CrowdStrike console, configure detection and prevention policies.
-
Integrate with SIEM:
- Use CrowdStrike API to pull event data into your SIEM.
-
Set Alerts:
- Configure alerts in the CrowdStrike console based on detection events.
-
Create OTX Account:
- Sign up at AlienVault OTX.
-
Integrate with SIEM:
- Use OTX API to integrate threat data with SIEM systems.
-
Set Alerts and Workflows:
- In your SIEM, create correlation rules based on OTX indicators.
-
Install Zeek:
sudo apt-get
-
Configure Network Interfaces:
- Edit
/usr/local/zeek/etc/node.cfg
to define network interfaces for monitoring.
- Edit
-
Edit zeek.cfg:
- Set paths for logs and scripts:
LogDir = /var/log/zeek
.
- Set paths for logs and scripts:
-
Deploy Scripts:
- Use built-in and custom Zeek scripts for specific detections.
-
Integrate with SIEM:
- Send Zeek logs to SIEM for correlation and analysis.
By following these configurations, you can effectively integrate various real-time data sources to enhance your APT detection capabilities.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or additions.
This project is licensed under the MIT License. See the LICENSE file for more details.