Our Analysis of Yelp Data Set to predict user sentiments based on their review.
- Lowercase
- Remove numbers
- Remove stop words using nltk
- Porter Stemming
- Create sparse matrix representation using scikit.
- Frequency vs Rank for a sample of yelp review dataset
- To find out the stop words we are using inverse term document frequency.
- To create a baseline for evaluating the algorithm, we are plotted the distribution of star category ratings.
- To get a better intuition of the text data we plotted the most common and recurring words in each of the reviews.
-
Bag of Words Generation - Bag of words representation of the user reviews.
-
Word Embeddings- Word embeddings representation of the user reviews.
-
Create models to predict sentiments based on user review and rating
-
Clone the repository
git clone https://github.com/hrushikesh-dhumal/Yelp-Data-Challlenge.git
-
Dependencies
Install the requirements using pip install -r requirements.txt
It is suggested that you have Anaconda which covers majority of the dependencies.
The entire work is in form of python notebook. Execute the playbooks in order of their serial number.
Hrushikesh Dhumal([email protected])
Parth Patel([email protected])