Skip to content

Latest commit

 

History

History
46 lines (32 loc) · 1.79 KB

README.md

File metadata and controls

46 lines (32 loc) · 1.79 KB

Synopsis

Our Analysis of Yelp Data Set to predict user sentiments based on their review.

  1. Lowercase
  2. Remove numbers
  3. Remove stop words using nltk
  4. Porter Stemming
  5. Create sparse matrix representation using scikit.
  1. Frequency vs Rank for a sample of yelp review dataset
  2. To find out the stop words we are using inverse term document frequency.
  3. To create a baseline for evaluating the algorithm, we are plotted the distribution of star category ratings.
  4. To get a better intuition of the text data we plotted the most common and recurring words in each of the reviews.

Analysis

  1. Bag of Words Generation - Bag of words representation of the user reviews.

  2. Word Embeddings- Word embeddings representation of the user reviews.

  3. Create models to predict sentiments based on user review and rating

    1. Support Vector Machine

    2. Long Short Term Memory Neural Network

Installation

  1. Clone the repository

    git clone https://github.com/hrushikesh-dhumal/Yelp-Data-Challlenge.git
    
  2. Dependencies

Install the requirements using pip install -r requirements.txt

It is suggested that you have Anaconda which covers majority of the dependencies.

Example

The entire work is in form of python notebook. Execute the playbooks in order of their serial number.

Author Information

Hrushikesh Dhumal([email protected])

Parth Patel([email protected])