Table of Contents
This project implements a movie recommendation system which is an ML-based approach to filtering or predicting the users film preferences based on their past choices and behavior. The primary goal of movie recommendation systems is to filter and predict only those movies that a corresponding user is most likely to want to watch, based on their preference. We focused on designing such a system using various text to vector conversion techniques and cosine similarity.
Text-to-Vector Techniques Used:
- TF-IDF: Term Frequency-Inverse Document Frequency
- Count Vectorizer: Converts text to a bag-of-words representation
- Hash Vectorizer: Converts text to a hashed representation
- Word2Vec: Word embeddings for text representation
- Doc2Vec: Document embeddings for text representationù
- GloVe: Global Vectors for Word Representation
The following open source packages are used in this project:
- Numpy
- Pandas
- Difflib
- Matplotlib
- Gensim-Models ( Word2Vec, Doc2Vec )
- Sklearn (TfIdVectorizer, CountVectorizer, HashVectorizer, cosine_similarity)
- Scikit-Learn
- Nltk
- Fuzzywuzzy
Our dataset, labeled as the "TMBD 5000 Movie Dataset", has been obtained from Kaggle and it includes 4803 items. Spanning 24 features, this dataset offers a comprehensive range of details, including movie ID, title, cast members, producers, release year, and assorted attributes.
This roadmap outlines the journey from collecting data to creating the recommendation system:
-
Data Preprocessing: This inclused collecting the dataset needed and then applying feature extraction on it.
-
Testing different text-to-vector conversion: We explored 6 different techniques in order to figure out which one has the best accuracy.
-
Generating recommendations: Based on a movie input, we generated 30 recommendations and this was possible by applying cosine similarity between movie vectors to determine similarity and recommend similar movies.
🎓 All participants in this project are undergraduate students of Applied Computer Science and Artificial Intelligence @ Sapienza University of Rome
👩 Rokshana Ahmed
Email: [email protected]
GitHub: @RoxyDiya
👩 Elena Martellucci
Email: [email protected]
GitHub: @elena-martellucci
👩 Firdaous Hajjaji
Email: [email protected]
GitHub: @Firdaous2002
✤ This was the final project for the course Deep Learning at Sapienza University of Rome