We plan on creating a machine learning program in order to simulate college basketball March Madness results for 2018. It can be used for bets on different games or even for building your own bracket in a chance to win the jackpot.
Use a Kaggle dataset containing tournament and regular season data alongside seedings and other variables to obtain potential standings in the 2018 March Madness tournament. Experiment with different classifiers and different types of data to produce the best machine learning algorithm to best reach our goal.
Figure out how to import data via pandas or other related data packages found in python like numpy. Mabye graph our data to see trends and isolate good labels (input) data for our program. Also perhaps try to weed out labels that might be too correlated.
Learn how to use scikit-learn for simpler classifiers, also think about its applications in our project
Continue to get adept with scikit-learn by using more advanced classifiers and start thinking about which one to use.
Begin creating a python program using our chosen classifier and input data.
Continue coding until finished, debug errors, finalize product.
" "
" "
Hopefully finish. If we finish earlier, we can also pick up another sport related data set do more with the predictions, such as comparing tournaments. WIP
Kaggle
Python (pandas, scikit-learn, matplotlib) for algorithm and loading/manipulating data
Free Food? :D and maybe more members who want to learn or know machine learning.