Spam Classification using Logistic Regression

Overview

This project focuses on classifying emails as spam or legitimate (ham) using logistic regression. It involves several steps, including data preprocessing, feature extraction, model training, and evaluation.

Data Preprocessing

The emails are preprocessed to remove stopwords, punctuation, and non-sense words using NLTK.
Tokenization and lemmatization are performed to normalize the text data.
The emails are converted into a features matrix using a count vectorizer.

Feature Extraction

The count vectorizer converts the text data into numerical features representing the frequency of each token.
The feature matrix is normalized using Min-Max scaling to ensure consistency across features.

Model Training

Logistic regression is applied to the training data to build a classification model.
The trained model is evaluated using the test data, and a classification report is generated to assess its performance.

Evaluation

The model's performance on the test data is as follows:

Class	Precision	Recall	F1-Score	Support
0	0.85	0.99	0.91	558
1	0.91	0.40	0.55	168
Accuracy			0.85	726
Macro Avg	0.88	0.69	0.73	726
Weighted Avg	0.86	0.85	0.83	726

The classification report provides insights into the model's performance, including precision, recall, F1-score, and support for both spam and legitimate classes.

Conclusion

This project demonstrates an end-to-end process for spam classification using logistic regression. By preprocessing the data, extracting meaningful features, and training a classification model, it offers a systematic approach to identifying spam emails.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
mc datathon		mc datathon
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Classification using Logistic Regression

Overview

Data Preprocessing

Feature Extraction

Model Training

Evaluation

Conclusion

About

Releases

Packages

Languages

License

Nawel-Bellil/Natural-Language-Processing-Spam-Classification

Folders and files

Latest commit

History

Repository files navigation

Spam Classification using Logistic Regression

Overview

Data Preprocessing

Feature Extraction

Model Training

Evaluation

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages