Skip to content

A Python-based movie recommendation system leveraging NLP and clustering techniques. This project includes data processing, vectorization of plot summaries, and the implementation of recommendation algorithms to suggest similar movies based on user input.

Notifications You must be signed in to change notification settings

GabrielMazzotta/NLP-Clustering--Movie-Similarity-from-Plot-Summaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2001: A Space Odyssey Close Encounters of the Third Kind

NLP & Clustering -Movie Similarity from Plot Summaries

Project Description

Natural Language Processing (NLP) is an exciting field of study for data scientists where they develop algorithms that can make sense out of conversational language used by humans. In this Project, I'll use NLP to find the degree of similarity between movies based on their plots available on IMDb and Wikipedia.

Dataset

The dataset contains the titles of the top 100 movies on IMDb as well as each movie's plot summary from both IMDb and Wikipedia.

Objective

To Find the top 3 similar movies within the same cluster.

Tools and Libraries

  • Tokenization and Lemmatization (spaCy)
  • TF-IDF (scikit-learn)
  • KMeans
  • Cosine Similarity / Similarity Score
  • Hierarchical Clustering (SciPy)
  • Seaborn / Matplotlib
  • Pandas
  • Numpy

About

A Python-based movie recommendation system leveraging NLP and clustering techniques. This project includes data processing, vectorization of plot summaries, and the implementation of recommendation algorithms to suggest similar movies based on user input.

Topics

Resources

Stars

Watchers

Forks