- Sentiment analysis is the process of determining whether a piece of writing is positive, negative or neutral.
- In this project I have demondtrated how various Machine Learning and Deep Learning models can be used for sentiment analysis.
-
The dataset used is "Sentiment Labelled Sentences Dataset", from the UC Irvine Machine Learning Repository.
-
The sentences come from three different websites/fields:
- amazon.com
- imdb.com
- yelp.com
-
Each sentence is labelled as either 1 (for positive) or 0 (for negative).
-
For each website,tThere exist 500 positive and 500 negative sentences.
-
This dataset was created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015. (Please cite the paper if you want to use it :))
-
Link to the dataset is: Sentiment Labelled Sentences Data Set
-
The dataset is present in the Dataset folder.
- I have used the follwoing Machine Learning models:
- Multinomial Naive bayes
- Random Forest
- LinearSVC
- The code implementing these models is in 'modules/Sentiment_Analysis_ML.ipynb'.
- All the trained models are stored at 'models/ML'. Thereafter the models are segrated as per the dataset (Amazon, IMDB, Yelp).
- I have used the follwoing Deep Learning models:
- Feed Forward Neural Network (FFNN)
- Convolutional Neural Network (CNN)
- Recurrent Neural Network (LSTM)
- As the dataset consists of three different set of data, I have created three different implementations for each of them.
- Amazon product Rreview Dataset ('modules/Amazon_Sentiment_Analysis_DL.ipynb')
- IMDB Movie Review Dataset ('modules/IMDB_Sentiment_Analysis_DL.ipynb')
- Yelp Restuarant Review Dataset ('modules/Yelp_Sentiment_Analysis_DL.ipynb')
- All the trained models are stored at 'models/DL'. Thereafter the models are segrated as per the dataset (Amazon, IMDB, Yelp).
- All the Deep Learning architectures use the GloVe Word Embeddings.
- To download click here (please download them before running the code.)
- The 6 Billion words, 100 dimensional vector representation variant is used.
- The have been stored at location 'Dataset/GloVe_Word_Embeddings'
After tyring various machine learning and deep learning models, I got the following results.
Model | Amazon Reviews | IMDB Reviews | Yelp Reviews |
---|---|---|---|
Multinomial Naive Bayes | 85% | 85% | 78% |
Random Forest | 80% | 79% | 79% |
Linear SVC | 84% | 81.50% | 80% |
FFNN | 81.50% | 84% | 82% |
CNN | 87% | 85.50% | 82.50% |
LSTM | 87% | 85% | 83% |