GitHub

In this project, supervised machine learning methods are used to predict whether an previously unseen Amazon Kindle Book Review is positive or negative based on its text content. The text analysis is done in Python and is performed on 1,000,000 reviews available on https://www.kaggle.com/datasets/bharadwaj6/kindle-reviews.

The text preprocessing requires the nltk module, which may need to be installed manually. Note that the code below was developed for Google Colab. Should you want to use the same lines, make sure that the data is stored in a folder called "data".

The most important preprocessing steps are performed by the prep function. Note that the order of the steps is important, especially that in step three the data frame changes substantially. Before step three we are dealing with a list of strings, whereas after step three each string has been decomposed into a list of words.

The function prep is work in progress and updated continuously, if you are interested in the current state of this function, just contact me. I'm happy to share my code and always welcome comments.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
README.md		README.md
kindle_analysis.ipynb		kindle_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

MarkMH/kindle_reviews

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages