Skip to content

Latest commit

 

History

History
134 lines (102 loc) · 5.54 KB

README.md

File metadata and controls

134 lines (102 loc) · 5.54 KB

Movie Recommendation System


📖 Table of Contents

Table of Contents
  1. ➤ About The Project
  2. ➤ Prerequisites
  3. ➤ Dataset
  4. ➤ Roadmap
  5. ➤ Contributors

-----------------------------------------------------

📝 About The Project

This project implements a movie recommendation system which is an ML-based approach to filtering or predicting the users film preferences based on their past choices and behavior. The primary goal of movie recommendation systems is to filter and predict only those movies that a corresponding user is most likely to want to watch, based on their preference. We focused on designing such a system using various text to vector conversion techniques and cosine similarity.

Text-to-Vector Techniques Used:

  • TF-IDF: Term Frequency-Inverse Document Frequency
  • Count Vectorizer: Converts text to a bag-of-words representation
  • Hash Vectorizer: Converts text to a hashed representation
  • Word2Vec: Word embeddings for text representation
  • Doc2Vec: Document embeddings for text representationù
  • GloVe: Global Vectors for Word Representation

-----------------------------------------------------

🍴 Prerequisites

made-with-python
Made with Jupyter

The following open source packages are used in this project:

  • Numpy
  • Pandas
  • Difflib
  • Matplotlib
  • Gensim-Models ( Word2Vec, Doc2Vec )
  • Sklearn (TfIdVectorizer, CountVectorizer, HashVectorizer, cosine_similarity)
  • Scikit-Learn
  • Nltk
  • Fuzzywuzzy

-----------------------------------------------------

💾 Dataset

Our dataset, labeled as the "TMBD 5000 Movie Dataset", has been obtained from Kaggle and it includes 4803 items. Spanning 24 features, this dataset offers a comprehensive range of details, including movie ID, title, cast members, producers, release year, and assorted attributes.

-----------------------------------------------------

🎯 Roadmap

This roadmap outlines the journey from collecting data to creating the recommendation system:

  1. Data Preprocessing: This inclused collecting the dataset needed and then applying feature extraction on it.

  2. Testing different text-to-vector conversion: We explored 6 different techniques in order to figure out which one has the best accuracy.

  3. Generating recommendations: Based on a movie input, we generated 30 recommendations and this was possible by applying cosine similarity between movie vectors to determine similarity and recommend similar movies.

-----------------------------------------------------

📜 Contributors

🎓 All participants in this project are undergraduate students of Applied Computer Science and Artificial Intelligence @ Sapienza University of Rome

👩 Rokshana Ahmed
      Email: [email protected]
      GitHub: @RoxyDiya

👩 Elena Martellucci
      Email: [email protected]
      GitHub: @elena-martellucci

👩 Firdaous Hajjaji
      Email: [email protected]
      GitHub: @Firdaous2002


This was the final project for the course Deep Learning at Sapienza University of Rome