Merge pull request #702 from Varunshiyam/main

Fixes Malware Detection
UppuluriKalyani · Nov 1, 2024 · 7e832ac · 7e832ac
2 parents 44b8622 + d4c9d5a
commit 7e832ac
Show file tree

Hide file tree

Showing 3 changed files with 2,242 additions and 2,966 deletions.
diff --git a/Prediction Models/LSTM_Traffic_Forecasting/grab-traffic-mgmt.ipynb b/Prediction Models/LSTM_Traffic_Forecasting/grab-traffic-mgmt.ipynb
diff --git a/Prediction Models/Malware Detection/Readme.md b/Prediction Models/Malware Detection/Readme.md
@@ -0,0 +1,80 @@
+
+## Malware 🤖 Detection with Deep Learning 🧑🏻‍💻
+This project demonstrates the implementation of a deep learning model for malware detection on Android devices. The model analyzes behavioral characteristics of applications to distinguish between benign and malicious software.
+
+----
+
+## Dataset
+The dataset used in this project contains 100,000 observations, each representing an Android application. There are 35 features extracted from the app's behavior in a Unix/Linux-based virtual machine. These features include:
+
+----
+
+## Features Description	Properties:
+
+| Features Description | Properties | 
+|---|---| 
+| hash APK/ SHA256 | file name | 
+| millisecond | time | 
+| classification | malware/benign | 
+| state | flag of unrunable/runnable/stopped tasks | 
+| usage_counter | task structure usage counter | 
+| prio | keeps the dynamic priority of a process | 
+| static_prio | static priority of a process | 
+| normal_prio | priority without taking RT-inheritance into account | 
+| policy | planning policy of the process | 
+| vm_pgoff | the offset of the area in the file, in pages. | 
+| vm_truncate_count | used to mark a vma as now dealt with | 
+| task_size | size of current task. | 
+| cached_hole_size | size of free address space hole. | 
+| free_area_cache | first address space hole | 
+| mm_users | address space users | 
+| map_count | number of memory areas | 
+| hiwater_rss | peak of resident set size | 
+| total_vm | total number of pages | 
+| shared_vm | number of shared pages. | 
+| exec_vm | number of executable pages. | 
+| reserved_vm | number of reserved pages. | 
+| nr_ptes | number of page table entries | 
+| end_data | end address of code component | 
+| last_interval | last interval time before thrashing | 
+| nvcsw | number of volunteer context switches. | 
+| nivcsw | number of in-volunteer context switches | 
+| min_flt | minor page faults | 
+| maj_flt | major page faults | 
+| fs_excl_counter | it holds file system exclusive resources. | 
+| lock | the read-write synchronization lock used for file system access | 
+| utime | user time | 
+| stime | system time | 
+| gtime | guest time | 
+| cgtime | cumulative group time. Cumulative resource counter | 
+| signal_nvcsw | used as cumulative resource counter. |
+
+------
+
+
+## Methodology
+- Data Loading and Preprocessing: The dataset is loaded and the 'classification' column is mapped to binary values (0 for benign, 1 for malware). The data is then shuffled.
+
+- Exploratory Data Analysis: The distribution of the target variable ('classification') is visualized. A correlation matrix is generated to identify relationships between features.
+
+- Feature Selection: Several features are dropped based on low correlation or redundancy.
+
+- Data Normalization: The data is normalized using StandardScaler to standardize the range of features.
+
+- Model Building: A deep neural network is constructed using TensorFlow/Keras. The model consists of multiple dense layers with ReLU activation functions and an output layer with a softmax activation function.
+
+- Model Compilation and Training: The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss function. It is trained on the training data with a specified batch size and number of epochs.   
+
+- Evaluation: The trained model is evaluated on the test data, and the test loss and accuracy are printed.
+
+- Fine-tuning: The model is further trained using an SGD optimizer with a lower learning rate and early stopping to potentially improve performance.
+
+-------
+
+## Results
+The model achieves high accuracy (over 99%) in classifying malware on the test data. The training history is visualized to observe the trend of accuracy and loss over epochs.
+
+------
+
+## Conclusion
+This project demonstrates the effectiveness of deep learning in malware detection. By analyzing behavioral patterns, the model can identify malicious applications with high accuracy, contributing to improved security on Android devices.