SMS-Classifier(-Spam-Detection-Model) : [Natural Language Processing]

The SMS Spam Detector project utilizes machine learning to create an effective model for detecting spam messages. It begins with preprocessing the SMS dataset, including handling encoding, removing duplicates, and text preprocessing. Exploratory data analysis provides insights into the dataset's characteristics. Modeling involves comparing various algorithms like Naive Bayes, Logistic Regression, and SVM. The Voting Classifier, combining SVM, Multinomial Naive Bayes, and Extra Trees Classifier, emerged as the top performer, achieving an accuracy of 94.16% and precision of 92.17%.

SMS Spam Detector:

Overview This project aims to create a robust SMS spam detector using machine learning. Various classification algorithms were compared to identify the best-performing model for spam detection. The final model is then used to classify incoming SMS messages as either 'spam' or 'ham' (not spam). Table of Contents • Overview • Features • Dataset • Data Cleaning and Pre-Processing • Exploratory Data Analysis (EDA) • Modeling • Evaluation • Best Performing Model • Installation • Usage • Results • Contributing • License

Features • Preprocessing of text data • Comparison of various machine learning algorithms • Evaluation metrics to compare models • Selection of the best-performing model • Deployment-ready code for SMS spam detection

Dataset The dataset used in this project is the SMS Spam Collection Dataset, which consists of a collection of SMS messages labeled as 'spam' or 'ham'. Data Cleaning and Pre-Processing

Loading Data: The dataset is loaded and inspected for structure and content.
Handling Encoding: Ensuring proper text encoding using the chardet library.
Removing Unnecessary Columns: Dropping columns that are not relevant for analysis.
Renaming Columns: Renaming columns for better readability.
Label Encoding: Converting categorical labels ('ham' and 'spam') into numerical values.
Removing Duplicates: Identifying and removing duplicate entries.
Text Preprocessing: • Lowercasing text • Tokenization • Removing non-alphanumeric characters • Removing stop words and punctuation • Stemming words Exploratory Data Analysis (EDA)
Basic Statistics: Calculating basic statistics for text length, number of words, and number of sentences.
Visualizations: • Pie chart showing the distribution of 'ham' and 'spam' • Histograms for the number of characters and words in messages • Pair plot for visualizing relationships between features • Correlation heatmap Modeling Various machine learning algorithms were compared, including: • Naive Bayes (Gaussian, Multinomial, Bernoulli) • Logistic Regression • Support Vector Machine (SVM) • Decision Tree • Random Forest • K-Nearest Neighbors (KNN) • Gradient Boosting • AdaBoost • Bagging Classifier • Extra Trees Classifier • XGBoost A Voting Classifier and a Stacking Classifier were also employed for combining the strengths of multiple models. Evaluation Each model was evaluated using the following metrics: • Accuracy • Precision • Confusion Matrix

Installation To run this project, we need to have Python installed on our machine and need to install the required packages using pip:

pip install -r requirements.txt

Usage To use the SMS spam detector, follow these steps:

Clone the repository: git clone https://github.com/Panchadip-128/SMS-Classifier--Spam-Detection-Model-.git

 Install the required packages:
 pip install -r requirements.txt

Run the main script to train the model and make predictions:
python main.py

To classify a new SMS message, use the following command:
python classify.py "Your SMS message here"

Results The best performing model, the Voting Classifier, achieved the following performance metrics on the test dataset: • Accuracy: 94.16% • Precision: 92.17%

Contributing Contributions are welcome! If you have any suggestions or feel any scope of improvements, please create a pull request or open an issue. Thank You!

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md
SMS_Classifier (1).ipynb		SMS_Classifier (1).ipynb
SMS_Classifier.ipynb - Colab.html		SMS_Classifier.ipynb - Colab.html
app.py		app.py
model.pkl		model.pkl
requirements.txt		requirements.txt
setup_nltk.txt		setup_nltk.txt
sms.py		sms.py
spam (1).csv		spam (1).csv
spam.csv		spam.csv
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS-Classifier(-Spam-Detection-Model) : [Natural Language Processing]

SMS Spam Detector:

About

Releases

Packages

Languages

Panchadip-128/SMS-Classifier--Spam-Detection-Model-

Folders and files

Latest commit

History

Repository files navigation

SMS-Classifier(-Spam-Detection-Model) : [Natural Language Processing]

SMS Spam Detector:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages