Netflix

About

This is a project I wrote as part of my undergraduate degree in Engineering at Cambridge (supervised by Zoubin Ghahramani).

The project was around using a Bayesian approach to enter the Netflix Competition (http://www.netflixprize.com/), a large collaborative filtering challenge in the field of machine learning. I was investigating different approaches, including different bits of the dataset or not, and the project culminated in a dissertation. This is the code I wrote to obtain my results.

In short, Netflix give a large number of ratings given by users for a set of movies. The challenge was to predict other genuine ratings on Netflix using that data, and to improve on the predictions of Netflix's own algorithm by 10%. The best submission I got in was roughly a 5.2% improvement on Netflix's own algorithm (not bad!)

Cool Feature

One cool feature is the innovative way I packed bits in. A lot of metadata about each reading is bundled into a single 64-bit long. This is done by having a set of distinct prime "keys" which are used as the right-hand side of the modulo operator to get the data out of the long. So if the key for favourite_number is 11 and the key for birthday_month is 13 and the 64-bit long is 97, the favourite_number value is 9 (97 % 11) and the birthday_month value is 6 (97 % 13). The only tricky bit is creating the 64-bit long from the values it has to hold: that's done by NetflixDataNG::GetBigRating.

Data

I did some pre-processing on the dataset (which I think you can still download) before reading it into this program. This is contained within PreProc.cpp and .h.

Compilation

cd Netflix
mkdir build
cd build
cmake ..
make

Credits

Zoubin Ghahramani and Sinead Williamson both helped in supervising me. The code began as a C++ port of Vibes by John Winn (http://johnwinn.org/). It was ported because Java isn't efficient enough to fit the whole Netflix dataset in memory at once, whereas C++ is on my 8GB RAM desktop. Where Vibes is a general code with a GUI, this code is designed for the single purpose of using variational inference to solve a particular model for the Netflix problem.

Disclaimer

I'm afraid this is largely PhD-ware. If you have any questions, I'm happy to answer them.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.cproject		.cproject
.project		.project
CMakeLists.txt		CMakeLists.txt
Constant.cpp		Constant.cpp
Constant.h		Constant.h
DotProductGaussianAddNG.cpp		DotProductGaussianAddNG.cpp
DotProductGaussianAddNG.h		DotProductGaussianAddNG.h
EX.h		EX.h
EXandLnX.h		EXandLnX.h
EXandX2.h		EXandX2.h
GammaNode.cpp		GammaNode.cpp
GammaNode.h		GammaNode.h
GaussianNode.cpp		GaussianNode.cpp
GaussianNode.h		GaussianNode.h
InferenceEngineNG.cpp		InferenceEngineNG.cpp
InferenceEngineNG.h		InferenceEngineNG.h
MD5.cpp		MD5.cpp
MD5.h		MD5.h
MassGaussianNode.cpp		MassGaussianNode.cpp
MassGaussianNode.h		MassGaussianNode.h
NetflixDataNG.cpp		NetflixDataNG.cpp
NetflixDataNG.h		NetflixDataNG.h
Node.h		Node.h
PreProc.cpp		PreProc.cpp
PreProc.h		PreProc.h
README.md		README.md
nfx.h		nfx.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix

About

Cool Feature

Data

Compilation

Credits

Disclaimer

About

Releases

Packages

Languages

hcarver/Netflix

Folders and files

Latest commit

History

Repository files navigation

Netflix

About

Cool Feature

Data

Compilation

Credits

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages