A comprehensive sentiment analysis project analyzing Amazon Electronics product reviews using VADER and TextBlob sentiment analysis techniques, featuring an interactive visualization dashboard.
This project conducts sentiment analysis on Amazon product reviews in the Electronics category. Using Natural Language Processing (NLP) techniques, VADER and TextBlob sentiment analyzers, we analyze customer sentiment patterns and derive insights from user feedback through an interactive dashboard.
- Sentiment Distribution:
- Positive sentiment: 82.9% of reviews
- Neutral sentiment: 11% of reviews
- Negative sentiment: 6.1% of reviews
- Rating-Sentiment Correlation:
- Higher star ratings align strongly with positive sentiment.
- Mixed sentiments are more common in 3-star reviews.
- Category and Brand Insights:
- Certain product categories and brands exhibit consistently higher positive sentiment.
- Technical products have more detailed sentiment patterns.
- Product-Level Analysis:
- High-review-count products show balanced sentiment distribution.
- Price and technical specifications are key drivers of sentiment.
- Majority of reviews show positive sentiment (82.9%).
- Neutral reviews account for 11% of total.
- Negative reviews represent 6.1% of the dataset.
- Strong correlation between star ratings and sentiment analysis results.
- Higher star ratings consistently show more positive sentiment.
- Mixed sentiments appear more frequently in 3-star reviews.
- Certain product categories show consistently higher positive sentiment.
- Technical products tend to have more detailed and nuanced sentiment patterns.
- Price sensitivity varies significantly across categories.
- Top brands maintain consistently higher positive sentiment ratios.
- Brand sentiment varies significantly by product category.
- Customer service and product reliability are key factors in brand sentiment.
Five-Star/
├── data/ # Data files and analysis results
├── docs/ # Project documentation and rubrics
├── notebooks/ # Jupyter notebooks for analysis
├── src/ # Python source code files
├── templates/ # HTML templates
├── .gitignore # Git ignore file
├── environment.yml # Conda environment configuration
├── Final_Report.pdf # Final report
├── Final_SA_Amazon_Presentation.pptx # Final presentation
└── README.md # Project documentation
Raw data source: https://jmcauley.ucsd.edu/data/amazon/index_2014.html Processed reviews: final_sentiment_analysis_data.csv
Column Name | Description |
---|---|
reviewer_id | Unique identifier for each reviewer |
asin | Amazon product identifier |
review_text | Full text of the review |
overall | Star rating (1-5 scale) |
summary | Short review title/summary |
Column Name | Description |
---|---|
helpful | List of [helpful_votes, total_votes] |
helpful_ratio | Ratio of helpful to total votes |
unix_review_time | Review timestamp (Unix format) |
review_time | Review date (MM DD, YYYY) |
review_date | Review date (YYYY-MM-DD) |
Column Name | Description |
---|---|
cleaned_text | Preprocessed review text |
processed_text | Tokenized/stemmed text |
review_length | Character count |
word_count | Number of words |
sentiment | Calculated sentiment (positive/neutral/negative) |
- Python 3.8 or higher
- Conda (Anaconda/Miniconda)
- Git (for cloning the repository)
name: sentiment_analysis_env
dependencies:
- python=3.9
- pandas
- numpy
- matplotlib
- seaborn
- tqdm
- wordcloud
- flask
- scikit-learn
- spacy
- pip
- pip:
- vaderSentiment
- notebook
- plotly
- nbformat
- textblob
- Clone the repository:
git clone https://github.com/yourusername/Five-Star.git
cd Five-Star
- Create the Conda environment:
conda env create -f environment.yml
- Activate the environment:
conda activate sentiment_analysis_env
- Install the spaCy language model:
python -m spacy download en_core_web_sm
- Verify the setup:
python -c "import pandas, numpy, matplotlib, seaborn, tqdm, wordcloud, flask, vaderSentiment, notebook, plotly, nbformat, textblob, spacy; print('Setup successful')"
- Start by exploring the Jupyter notebooks in the
notebooks/
directory:
jupyter notebook
- Load and preprocess data using
02_preprocessing_reviews_data_part3.ipynb
. - Run sentiment analysis on desired product reviews using
03_sentiment_analysis_indepth_part2.ipynb
.
The project includes an interactive Flask-based dashboard to visualize results.
- Overall Sentiment Distribution: Interactive pie charts showing sentiment breakdowns.
- Rating Analysis:
- Sentiment distribution across star ratings.
- Grouped bar charts showing sentiment patterns.
- Category and Brand Analysis:
- Top categories/brands by review count.
- Sentiment distribution within each category/brand.
- Product Insights:
- Top 5 positive and negative products.
- Sentiment ratios and review counts.
- Activate the environment:
conda activate sentiment_analysis_env
- Navigate to the
src
directory and run:
python3 04_data_visualization_advanced_part4.py
- Sentiment classification using VADER and TextBlob.
- Word frequency analysis.
- Brand and product category sentiment trends.
- Temporal sentiment analysis.
- Review helpfulness correlation.
This project is part of the DS5110 course at Northeastern University.