Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📃: Text Classification for Spam Detection #70

Open
Avdhesh-Varshney opened this issue Jun 30, 2024 · 8 comments
Open

📃: Text Classification for Spam Detection #70

Avdhesh-Varshney opened this issue Jun 30, 2024 · 8 comments
Labels
Feat: Model Building Priority: Low Up-for-Grabs ✋ Issues are opened for the contributors

Comments

@Avdhesh-Varshney
Copy link
Owner

🔴 Title : Text Classification for Spam Detection
🔴 Aim : Create a text classification system to detect spam messages using machine learning techniques.
🔴 Brief Explanation :

  • Gather a dataset of text messages labeled as spam or not spam.
  • Preprocess the text data and extract relevant features.
  • Train a machine learning model (such as Naive Bayes, SVM, or logistic regression) to classify messages as spam or not spam.
  • Develop an interface where users can input text messages and receive predictions on whether they are spam or not.

Screenshots 📷

N/A


To be Mentioned while taking the issue :

  • Full name :
  • What is your participant role? (Mention the Open Source Program name. Eg. GSSOC, SSOC, JWOC, etc.)

Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

@sid7219
Copy link

sid7219 commented Oct 2, 2024

Full name : Siddharth Gupta
GSSoc'24 Extended

@saniyaahemad12
Copy link

Please assign this topic to me

@Avdhesh-Varshney
Copy link
Owner Author

@sid7219 @saniyaahemad12 tell me the approach and dataset used for the same?

@ramu-nukavarapu
Copy link

Approach:
Dataset : Use the public dataset "SMS spam collection dataset"
Text preprocessing : using libraries like pandas, nltk to tokenizing, lowering, remove stop words and perform stemming or lemmatization.
Feature extraction : extract features using TF-IDF vectorizer
Train the model : for training, use naive bayes (good for classification tasks)
User interface : develop using streamlit components

Name : Ramu
Role : GSSoC contributor

Assign this issue to me, to work on this!

@saniyaahemad12
Copy link

Here is the approach for text classification for spam detection
Data Collection: Use a labeled dataset like the SMS Spam Collection dataset.
Data Preprocessing:
Clean text by removing special characters and stopwords.
Tokenize and apply stemming/lemmatization.
Feature Extraction:
Use TF-IDF to convert text into numerical vectors.
Optionally, use Bag of Words model.
Model Selection:
Train models using Naive Bayes, SVM, and Logistic Regression.
Model Training and Testing:
Split data into training/testing sets, apply cross-validation.
Model Evaluation:
Use accuracy, precision, recall, and F1-score to evaluate performance.
Interface Development:
Develop a user interface with Tkinter for input and spam prediction.
If you like the technique please assign this topic to me.

@Avdhesh-Varshney
Copy link
Owner Author

@ramu-nukavarapu In PR attach the screenshot of the accuracy of the model during training and testing.

@ramu-nukavarapu
Copy link

ramu-nukavarapu commented Oct 4, 2024

@Avdhesh-Varshney okay

Add hackotoberfest label as well

@aakashmohole
Copy link

i like to solve this issue please assign it to me

Name : Aakash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feat: Model Building Priority: Low Up-for-Grabs ✋ Issues are opened for the contributors
Projects
Status: In Progress
Development

No branches or pull requests

6 participants