NLP & Clustering -Movie Similarity from Plot Summaries

Project Description

Natural Language Processing (NLP) is an exciting field of study for data scientists where they develop algorithms that can make sense out of conversational language used by humans. In this Project, I'll use NLP to find the degree of similarity between movies based on their plots available on IMDb and Wikipedia.

Dataset

The dataset contains the titles of the top 100 movies on IMDb as well as each movie's plot summary from both IMDb and Wikipedia.

Objective

To Find the top 3 similar movies within the same cluster.

Tools and Libraries

Tokenization and Lemmatization (spaCy)
TF-IDF (scikit-learn)
KMeans
Cosine Similarity / Similarity Score
Hierarchical Clustering (SciPy)
Seaborn / Matplotlib
Pandas
Numpy

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Images		Images
datasets		datasets
.gitignore		.gitignore
Movie_Recommendations_NLP_Clustering.ipynb		Movie_Recommendations_NLP_Clustering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP & Clustering -Movie Similarity from Plot Summaries

Project Description

Dataset

Objective

Tools and Libraries

About

Languages

GabrielMazzotta/NLP-Clustering--Movie-Similarity-from-Plot-Summaries

Folders and files

Latest commit

History

Repository files navigation

NLP & Clustering -Movie Similarity from Plot Summaries

Project Description

Dataset

Objective

Tools and Libraries

About

Topics

Resources

Stars

Watchers

Forks

Languages