Given a dataset containing textual news articles or headlines, the goal is to classify each article as either "Fake" or "Authentic." Fake news is typically defined as news that contains false or intentionally misleading information, while real news contains accurate and factual information. The challenge is to develop a model that can effectively distinguish between fake and real news articles.
Demo_Fake-News-Detection.mp4
A proposed solution for detecting fake news is a Python-based machine learning model that uses a dataset of news articles and performs preprocessing, vectorization, and training to classify the articles as real or fake. The model uses Linear Support Vector Classification (Linear SVC) algorithm and has shown high accuracy in detecting fake news. Exploratory Data Analysis will also be performed on the dataset. We create a pipeline that combines TF-IDF vectorization and LinearSVC.
- Fake: Input an article from https://www.theonion.com/
- Authentic: Input an article from https://www.reuters.com/
-
Create a virtual environment.
- In this project we use the virtualenv package which can be installed by running
pip install virtualenv
in the terminal. - Create a virtual environment by running
python -m virtualenv venv
. - Activate the virtual environment by running
venv\Scripts\activate
on Windows.
- In this project we use the virtualenv package which can be installed by running
-
Install the required packages.
- The packages can be installed by running
pip install -r requirements.txt
. - This should install the necessary packages, however, some packages could be deprecated.
- The packages can be installed by running
-
Run the cells within "prerequisites.ipynb"
-
In the terminal:
streamlit run analysis.py
(Will take some time to run).- hosted_analysis.py: Does not make use of PySpark
- analysis.py: Makes use of PySpark