GitHub - Duck-m-a-n/Malware_Detection_ML

👾 Android Malware Detection Using Machine Learning 🕵️‍♂️

This project aims to develop a machine learning model for the detection of Android malware using network traffic data. The model classifies network flows as either malware or benign based on various flow features.

Dataset

Kaggle: Android Malware Detection: Detection of Android Malware using Machine Learning

Dataset created by: Cyber Cop

The dataset consists of network flow data from various Android applications, including both benign and malware-infected apps.

Methodology

Data Preprocessing: The raw dataset is preprocessed to handle missing values, drop irrelevant features, and convert categorical features into numerical format.
Feature Selection: To identify the most relevant features for the classification task, the Mann-Whitney U test is performed on numerical features, and the Chi-squared test is performed on categorical features.
Model Selection: RandomForestClassifier, XGBoostClassifier, and LGBMClassifier are chosen for their performance in handling imbalanced datasets and robustness against overfitting.
Handling Imbalanced Data: The Synthetic Minority Over-sampling Technique (SMOTE) is used to balance the dataset, providing better performance and speed compared to other techniques like Tomek links and NearMiss.
Hyperparameter Tuning: Optuna, a hyperparameter optimization framework, is used to find the best hyperparameters for the chosen models.
Model Evaluation: The models are evaluated using the F2 score, which focuses on recall while still maintaining a balance with precision. The precision-recall curve is also used to visualize the trade-off between precision and recall.

Results

The optimized model, a RandomForestClassifier, was able to effectively classify network flows as malware or benign. The most important features for the classification were FLOW IAT min and FLOW IAT max, which represent the minimum and maximum inter-arrival times between data packets in a network flow, respectively.

Next Steps and Improvements

Further Feature Engineering: Explore additional feature engineering techniques to potentially improve model performance.
Deep Learning: Investigate deep learning techniques, such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), for classifying network flows.
Real-time Detection: Implement the model in a real-time network traffic monitoring system to detect Android malware in live network environments.
Model Interpretability: Investigate model interpretability techniques like SHAP (SHapley Additive exPlanations) to better understand how the model makes its predictions and to build trust in the results.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Malware_Final.ipynb		Malware_Final.ipynb
Presentation Malware Detection.pdf		Presentation Malware Detection.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👾 Android Malware Detection Using Machine Learning 🕵️‍♂️

Dataset

Methodology

Results

Next Steps and Improvements

About

Releases

Packages

Languages

License

Duck-m-a-n/Malware_Detection_ML

Folders and files

Latest commit

History

Repository files navigation

👾 Android Malware Detection Using Machine Learning 🕵️‍♂️

Dataset

Methodology

Results

Next Steps and Improvements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages