This project is a simple example which trains the model to predict phishing websites. Phishing websites are fake websites which try to gain the trust of users to steal private data of users.
- Best accuracy score - 97.0% using Random forest method
- Worst accuract score - 48.5% using One class svm method
- Scikit-learn (sklearn)
- Numpy
Requirements can be installed by executing pip install -r requirements.txt
The data set for training has been taken from UCI archive
python classifier.py
to check the accuracy of the script.python classifier.py google.com
to check whether google.com is phishing website or not.
Each value in the dataset contains all these elements and all are seperated by a comma.
- having_IP_Address { -1,1 }
- URL_Length { 1,0,-1 }
- Shortining_Service { 1,-1 }
- having_At_Symbol { 1,-1 }
- double_slash_redirecting { -1,1 }
- Prefix_Suffix { -1,1 }
- having_Sub_Domain { -1,0,1 }
- SSLfinal_State { -1,1,0 }
- Domain_registeration_length { -1,1 }
- Favicon { 1,-1 }
- port { 1,-1 }
- HTTPS_token { -1,1 }
- Request_URL { 1,-1 }
- URL_of_Anchor { -1,0,1 }
- Links_in_tags { 1,-1,0 }
- SFH { -1,1,0 }
- Submitting_to_email { -1,1 }
- Abnormal_URL { -1,1 }
- Redirect { 0,1 }
- on_mouseover { 1,-1 }
- RightClick { 1,-1 }
- popUpWidnow { 1,-1 }
- Iframe { 1,-1 }
- age_of_domain { -1,1 }
- DNSRecord { -1,1 }
- web_traffic { -1,0,1 }
- Page_Rank { -1,1 }
- Google_Index { 1,-1 }
- Links_pointing_to_page { 1,0,-1 }
- Statistical_report { -1,1 }
- Result { -1,1 }