User-based-Collaborative-Filtering

User-based Collaborative Filtering in Python (Adapted from University of Minnesota CSci 1901H Class project)

Overview

Implement a simple user-based collaborative filtering recommender system for predicting the ratings of an item using the data given. This prediction should be done using k nearest neighbors and Pearson correlation. Finally using the similarity of the k nearest neighbors, it is required to predict the ratings of the new item for the given user.

Format of ratings file

The input file consists of one rating event per line. Each rating event is of the form: user_id\trating\tmovie_title
The user_id is a string that contains only alphanumeric characters and hyphens and spaces (no tabs).
The rating is one of the float values 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, and 5.0.
The movie_title is a string that may contain space characters (to separate the words).
The three fields -- user_id, rating, and the movie_title -- are separated by a single tab character (\t).

Requirements

pearson_correlation(user1, user2)

This function calculates the pearson correlation between 2 users.
Return value is a float between 1 and -1.
For calculating the average for each user, include all the user’s ratings and not just the intersection of the 2 user’s ratings.
However when computing summation, use only items that both users have rated.

K_nearest_neighbors(user1, k) :

This function calculates the k nearest neighbors of user1 based on pearson similarity.
Returns a list of k nearest neighbors and their similarity.
For calculating the average for each user, include all the user’s ratings and not just the intersection of the 2 user’s ratings.
However when computing summation, use only items that both users have rated.
When sorting similarities, if 2 users have the same similarity sort them by user id.

Predict(user1, item, k_nearest_neighbors) :

This function calculates the final prediction for item for user1 using k nearest neighbors.
You will compute a simple weighted average of the ratings provided by the k nearest neighbors.
Use only the neighbors who have rated the input item.
Prediction = ∑(Wi,1)*(rating i,item) / ∑(Wi,1) where Wi,1 is the similarity of user i with user1 from the k nearest neighbors.

Execution

Python collabFilter.py ratings-dataset.tsv Kluver ‘The Fugitive’ 10 ratings-dataset.tsv: input file Kluver: User id Movie: The Fugitive K: 10

Output:

The program will output:

K nearest neighbors with their user ids and similarity values separated by space as per the output file
Rating prediction for item.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
collabFilter.py		collabFilter.py
output.txt		output.txt
ratings-dataset.tsv		ratings-dataset.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

User-based-Collaborative-Filtering

Overview

Format of ratings file

Requirements

Execution

Output:

About

Releases

Packages

Languages

License

ZwEin27/User-based-Collaborative-Filtering

Folders and files

Latest commit

History

Repository files navigation

User-based-Collaborative-Filtering

Overview

Format of ratings file

Requirements

Execution

Output:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages