The anlaysis aims to analyze the product reviews written by members of the paid Amazon Vine program, and determine if they are favorably biased as opposed to the ones written by non-vine members/customers.
The dataset explored in this project looks at reviews written for video games. This dataset was taken from Amazon Review datasets, which lists reviews for about 50 different products. The schemata of the dataset is as follows:
- Amazon Web Services (AWS)
- PySpark
- Python (Pandas)
- SQL
- pgAdmin
- Google Colab notebook
- Jupyter notebook
- VS Code
This project runs the analysis using three different technologies (for experimentation's sake - either produces sufficient results):
- PySpark
- Python (Pandas)
- SQL
PySpark is used to perform the ETL (Extract, Transform, and Load) to extract the particular dataset, transform it, connect to a AWS RDS (Relational Database Services) instance, and then load that transformed dataset into pgAdmin. Next, either Pyspark, Pandas, or SQL have been used to analyze the data in order to determine if there is a bias towards writing favorable reviews among Vine (versus non-Vine) members.
Amazon_Reviews_ETL.ipynb shows how the ETL process was used to extract the video games dataset, transform it into four different tables, and then load these four tables into pgAdmin. The four tables created are: customers_table, products_table, review_id_table, and vine_table.
Next, an analysis is run to determine is there is any favorable bias among the paid Vine reviews. To this end, Vine_Review_Analysis.ipynb uses PySpark, Vine_Review_Analysis_pandas.ipynb uses Python's Pandas library, and Vine_Review_Analysis.sql uses SQL to run the same analysis. Either of these files can be viewed to understand the process.
One table was created for vine-only reviews, and one for non-vine-only reviews. There are a total of 40,565 reviews in the original vine_table, out of which 94 are vine reviews and 40,471 are non-vine reviews.
There are 15,711 five star reviews in all. There are 48 five star-vine reviews, and 15,663 five star-non-vine reviews.
There percentage of five star vine reviews is 51.06%, and the percentage of five star-non-vine reviews is 38.70%.
Since the percentage of five star reviews by Vine members (51.06%) is greater than the percentage of five star reviews by non-Vine members (38.70% only), we can deduce that there is in fact a positivity bias for reviews in the Vine program. A further analysis that can help shed more light on this can inlcude gauging the percentage of three stars or lower (or two stars or lower) reviews by Vine versus non-Vine members.
Email: [email protected]