-
Notifications
You must be signed in to change notification settings - Fork 121
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #702 from Varunshiyam/main
Fixes Malware Detection
- Loading branch information
Showing
3 changed files
with
2,242 additions
and
2,966 deletions.
There are no files selected for viewing
2,966 changes: 0 additions & 2,966 deletions
2,966
Prediction Models/LSTM_Traffic_Forecasting/grab-traffic-mgmt.ipynb
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
|
||
## Malware 🤖 Detection with Deep Learning 🧑🏻💻 | ||
This project demonstrates the implementation of a deep learning model for malware detection on Android devices. The model analyzes behavioral characteristics of applications to distinguish between benign and malicious software. | ||
|
||
---- | ||
|
||
## Dataset | ||
The dataset used in this project contains 100,000 observations, each representing an Android application. There are 35 features extracted from the app's behavior in a Unix/Linux-based virtual machine. These features include: | ||
|
||
---- | ||
|
||
## Features Description Properties: | ||
|
||
| Features Description | Properties | | ||
|---|---| | ||
| hash APK/ SHA256 | file name | | ||
| millisecond | time | | ||
| classification | malware/benign | | ||
| state | flag of unrunable/runnable/stopped tasks | | ||
| usage_counter | task structure usage counter | | ||
| prio | keeps the dynamic priority of a process | | ||
| static_prio | static priority of a process | | ||
| normal_prio | priority without taking RT-inheritance into account | | ||
| policy | planning policy of the process | | ||
| vm_pgoff | the offset of the area in the file, in pages. | | ||
| vm_truncate_count | used to mark a vma as now dealt with | | ||
| task_size | size of current task. | | ||
| cached_hole_size | size of free address space hole. | | ||
| free_area_cache | first address space hole | | ||
| mm_users | address space users | | ||
| map_count | number of memory areas | | ||
| hiwater_rss | peak of resident set size | | ||
| total_vm | total number of pages | | ||
| shared_vm | number of shared pages. | | ||
| exec_vm | number of executable pages. | | ||
| reserved_vm | number of reserved pages. | | ||
| nr_ptes | number of page table entries | | ||
| end_data | end address of code component | | ||
| last_interval | last interval time before thrashing | | ||
| nvcsw | number of volunteer context switches. | | ||
| nivcsw | number of in-volunteer context switches | | ||
| min_flt | minor page faults | | ||
| maj_flt | major page faults | | ||
| fs_excl_counter | it holds file system exclusive resources. | | ||
| lock | the read-write synchronization lock used for file system access | | ||
| utime | user time | | ||
| stime | system time | | ||
| gtime | guest time | | ||
| cgtime | cumulative group time. Cumulative resource counter | | ||
| signal_nvcsw | used as cumulative resource counter. | | ||
|
||
------ | ||
|
||
|
||
## Methodology | ||
- Data Loading and Preprocessing: The dataset is loaded and the 'classification' column is mapped to binary values (0 for benign, 1 for malware). The data is then shuffled. | ||
|
||
- Exploratory Data Analysis: The distribution of the target variable ('classification') is visualized. A correlation matrix is generated to identify relationships between features. | ||
|
||
- Feature Selection: Several features are dropped based on low correlation or redundancy. | ||
|
||
- Data Normalization: The data is normalized using StandardScaler to standardize the range of features. | ||
|
||
- Model Building: A deep neural network is constructed using TensorFlow/Keras. The model consists of multiple dense layers with ReLU activation functions and an output layer with a softmax activation function. | ||
|
||
- Model Compilation and Training: The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss function. It is trained on the training data with a specified batch size and number of epochs. | ||
|
||
- Evaluation: The trained model is evaluated on the test data, and the test loss and accuracy are printed. | ||
|
||
- Fine-tuning: The model is further trained using an SGD optimizer with a lower learning rate and early stopping to potentially improve performance. | ||
|
||
------- | ||
|
||
## Results | ||
The model achieves high accuracy (over 99%) in classifying malware on the test data. The training history is visualized to observe the trend of accuracy and loss over epochs. | ||
|
||
------ | ||
|
||
## Conclusion | ||
This project demonstrates the effectiveness of deep learning in malware detection. By analyzing behavioral patterns, the model can identify malicious applications with high accuracy, contributing to improved security on Android devices. |
Oops, something went wrong.