Skip to content

We plan on creating a machine learning program in order to use sentiment analysis to determine whether news is fake or not and use data science to guide our conclusions and explore our dataset.

Notifications You must be signed in to change notification settings

hirish99/sentiment-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sentiment-analysis-project 2020-2021

We have created a machine learning program along with TF-IDF preprocessing in order to determine whether the news is fake or not and use data science to guide our conclusions and explore our dataset (Fall 2020). We are currently pursuing more advanced models using PyTorch (2021). Group Members: Hirish Chandrasekaran, Isha Gokhale, Katie Huynh, Kevin Zhang, Mateo Wang, Priyasha Agarwal. Kennard Peters helped the group for understanding the implementation and theory behind the scikit-learn models.

Video: https://drive.google.com/file/d/1ezW-NzZMqaTOB-a-nXOfkvTgH7eWlAiB/view

General Plan

Use a dataset provided by DataFlare: https://data-flair.training/blogs/advanced-python-project-detecting-fake-news/ as a starting point for our models. Experiment with different models, starting with a Passive Aggressive Classification Algorithm for Fall. Explore more advanced models using PyTorch Winter and Spring.

Timeline

Fall Summary

Week 3:

Explore project group ideas, look at different data sets. Compare ideas.

Week 4:

Finalize project group members, finalize data set, finalize theme and scope of project as it relates to sentiment analysis.

Week 5:

Start unpacking data, analyzing with pandas/numpy.

Week 6 - 9:

Get aquianted with scikit-learn, divide group up into two: group (1) purused a naive bayesian classifier approach using scikit-learn (Isha, Priyasha), group (2)(Hirish, Katie, Kevin) pursued a support vector classifier. Completed a working model, tuned parameters, pickled the SVC model, and commited both model to repository in proper branch. At the end of each meeting both groups explained their respective model and implementation to the other group.

Winter Plan

Week 1

Begin learning PyTorch. Tutorials here: https://pytorch.org/tutorials/, neural networks and backpropogation: https://www.deeplearningbook.org/, in-depth explanation of PyTorch functions: https://www.deeplearningwizard.com/deep_learning/boosting_models_pytorch/forwardpropagation_backpropagation_gradientdescent/. RNN and CNN's on text data using TorchText and PyTorch: https://github.com/bentrevett/pytorch-sentiment-analysis.

Week 2

We continued to learn PyTorch, specifically gradient descent and loss functions. Simple feedforward networks and backpropogation were discussed. Tensors: https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py. Tensors and autograd: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html.

Week 3

Introduction to RNN's. Kevin/Katie attempted to process data using TorchText. Isha/Priyasha/Hirish focused on learning RNN's in PyTorch.

Week 4

Kevin/Katie implement a RNN on our DataFlare data set and get an accuracy score. Isha/Hirish continue to work on CNN and also get a an accuracy score.

Week 5

Kevin/Katie/Priyasha continue to improve RNN. Isha/Hirish get an accuracy score for CNN.

Week 6 - 9

The RNN is updated after running into bugs and the CNN has its code explained with comments. Done with our project!

Data Sources

DataFlare, Kaggle

Technologies

Python (pandas, scikit-learn, matplotlib) for algorithm and loading/manipulating data. PyTorch and TorchText for more customizable models.

About

We plan on creating a machine learning program in order to use sentiment analysis to determine whether news is fake or not and use data science to guide our conclusions and explore our dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published