We have created a machine learning program along with TF-IDF preprocessing in order to determine whether the news is fake or not and use data science to guide our conclusions and explore our dataset (Fall 2020). We are currently pursuing more advanced models using PyTorch (2021). Group Members: Hirish Chandrasekaran, Isha Gokhale, Katie Huynh, Kevin Zhang, Mateo Wang, Priyasha Agarwal. Kennard Peters helped the group for understanding the implementation and theory behind the scikit-learn models.
Video: https://drive.google.com/file/d/1ezW-NzZMqaTOB-a-nXOfkvTgH7eWlAiB/view
Use a dataset provided by DataFlare: https://data-flair.training/blogs/advanced-python-project-detecting-fake-news/ as a starting point for our models. Experiment with different models, starting with a Passive Aggressive Classification Algorithm for Fall. Explore more advanced models using PyTorch Winter and Spring.
Explore project group ideas, look at different data sets. Compare ideas.
Finalize project group members, finalize data set, finalize theme and scope of project as it relates to sentiment analysis.
Start unpacking data, analyzing with pandas/numpy.
Get aquianted with scikit-learn, divide group up into two: group (1) purused a naive bayesian classifier approach using scikit-learn (Isha, Priyasha), group (2)(Hirish, Katie, Kevin) pursued a support vector classifier. Completed a working model, tuned parameters, pickled the SVC model, and commited both model to repository in proper branch. At the end of each meeting both groups explained their respective model and implementation to the other group.
Begin learning PyTorch. Tutorials here: https://pytorch.org/tutorials/, neural networks and backpropogation: https://www.deeplearningbook.org/, in-depth explanation of PyTorch functions: https://www.deeplearningwizard.com/deep_learning/boosting_models_pytorch/forwardpropagation_backpropagation_gradientdescent/. RNN and CNN's on text data using TorchText and PyTorch: https://github.com/bentrevett/pytorch-sentiment-analysis.
We continued to learn PyTorch, specifically gradient descent and loss functions. Simple feedforward networks and backpropogation were discussed. Tensors: https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py. Tensors and autograd: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html.
Introduction to RNN's. Kevin/Katie attempted to process data using TorchText. Isha/Priyasha/Hirish focused on learning RNN's in PyTorch.
Kevin/Katie implement a RNN on our DataFlare data set and get an accuracy score. Isha/Hirish continue to work on CNN and also get a an accuracy score.
Kevin/Katie/Priyasha continue to improve RNN. Isha/Hirish get an accuracy score for CNN.
The RNN is updated after running into bugs and the CNN has its code explained with comments. Done with our project!
DataFlare, Kaggle
Python (pandas, scikit-learn, matplotlib) for algorithm and loading/manipulating data. PyTorch and TorchText for more customizable models.