A model that aims to predict the outcome of a NHL playoff series before the first game is played. Currently, the model is an ensemble of five bagged elastic net models, trained on data from the 2006-2019 NHL playoffs.
Sits at 0.65742 log loss using data from 2006 and onwards (if using Time Related Features, which are currently inaccessible due to a Natural Stat Trick bug), otherwise 0.67949 is the estimated performance.
- glmnet
- caret
- tidyverse
- recipes
- moments
- ParBayesianOptimization
- parallel
- fastknn
- RSelenium and a proper configuration of Selenium installed on your local machine
- rvest
- data: Contains all raw, processed, and external data sets that the model is fit on.
- src: Contains all modelling, prediction, and scraping scripts in R. Also contains templates used for building the dataset.
- tests: Contains data sets with known correct values to check validity of input when scraping.
For validation of the model:
- Download/clone the repository.
- Set your current working directory to the directory of the downloaded repository on your local computer.
- Run "src/modelling/main/bagged-elastic-net-bayesian.R."
For prediction of new data:
- Download/clone the repository.
- Set your current directory to the directory of the downloaded repository on your local computer.
- Run "src/prediction/R/final-model.R". The vector new_data must contain observations with missing "result_factor" values.
- 1. Finish the scraping automation.
- 2. Redo the modelling script.
- 3. Add 2013 data. Improve the model below 0.67 log loss.
- 4. Document how to run the scripts from scratch.
Data Pulled From:
https://www.corsicahockey.com/
http://www.puckon.net/
https://evolving-hockey.com/
http://www.espn.com/
https://www.nhl.com/
https://www.oddsportal.com/
https://www.naturalstattrick.com/
https://www.hockey-reference.com/
Credit for the ELO calculator formulas goes to the owner of HockeyAnalytics. The source page can be found here:
http://hockeyanalytics.com/2016/07/elo-ratings-for-the-nhl/
A big thanks to my good friend Mr. Riley Peters for spotting the logic flaws in some of the pre processing as well as supplying large amounts of suggestions on the game of hockey.