This repository contains implementations of various movie recommendation algorithms including Content-Based, Collaborative Filtering, and Hybrid methods. These algorithms are implemented using Python and popular libraries such as scikit-learn, Surprise, and Streamlit.
Movie recommendation systems are widely used to suggest movies to users based on their preferences. This repository provides implementations of different recommendation algorithms:
- Content-Based Filtering: Recommends movies similar to a given movie based on their features such as genres, cast, and keywords.
- Collaborative Filtering: Recommends movies to a user based on the preferences of similar users or items.
- Hybrid Filtering: Combines content-based and collaborative filtering methods to provide personalized recommendations.
Content-based filtering recommends items similar to what the user has liked in the past. It uses item features to make recommendations. In this repository, content-based filtering is implemented using:
- CountVectorizer: Converts text data (movie descriptions, genres, cast, etc.) into a matrix of token counts.
- TF-IDF Vectorizer: Converts text data into TF-IDF (Term Frequency-Inverse Document Frequency) vectors to reflect the importance of words in documents.
Collaborative filtering recommends items based on the preferences of similar users or items. In this repository, collaborative filtering is implemented using:
- Singular Value Decomposition (SVD): Decomposes the user-item interaction matrix to find latent factors representing user preferences and item features.
Hybrid filtering combines content-based and collaborative filtering methods to provide more accurate and personalized recommendations. In this repository, hybrid filtering is implemented by combining content-based and collaborative filtering algorithms.
- The full dataset: This dataset consists of 26,000,000 ratings and 750,000 tag applications applied to 45,000 movies by 270,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.
- The small dataset: This dataset comprises of 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.
- Download the dataset using this link
- Put the csv files inside
data
folder. - Following files are present in the dataset:
credits.csv keywords.csv links.csv links_small.csv movies_metadata.csv ratings.csv ratings_small.csv
-
Clone the repository:
git clone https://github.com/your-username/movie-recommendation-system.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
-
Your code would start its execution in your default browser
- Matrix factorization: https://www.youtube.com/watch?v=ZspR5PZemcs
- Cosine Similarity: https://www.youtube.com/watch?v=e9U0QAFbfLI
- Resources for SVD:
Special thanks to Jalaj Thanaki for the codebase. I've built a Streamlit wrapper around it and incorporated personal optimizations.