For the second straight year I entered the Kaggle competition to see who can predict the most accurate March Madness bracket.
GitHub Link: https://lnkd.in/d6auw9X
Kaggle Link: https://lnkd.in/gx5CaHhx
—————–2021 Results—————–
In 2021, I leveraged the Kaggle input data and combined with Kenpom data. I then applied a Gradient Boosting Classifier and predicted the results.
My resulting bracket was pretty decent. It had an overall log loss 0.66587 of which put me at 503 (out of 707) on the Kaggle March Madness prediction leaderboard.
—————2022 Predictions—————
For 2022, I followed a similar approach. However this time, I trained and tested multiple classification models in order to find the highest predictor (which in this case was an Extra Trees Classifier).
Here are the steps I used to build the model:
(1) Import Train & Test Sets and combine with Kenpom data
(2) Prepare sets for model training
(3) Use Lazy Predict to test multiple model types
(4) Apply a ExtraTreesClassifier to predict the win probability of each potential game
(5) Analyze Results
——————Analysis———————
The results of my model were again mostly chalk (picking top seeds to win), however this makes sense because many of the same input factors used in my model are used in the seeding process.
However, there were a few interesting upsets projected:
First Round
(10) Loyola Chicago over (7) Ohio St [0.54 probability]
(10) San Francisco over (7) Murray St [0.51 probability]
(9) TCU over (8) Seton Hall [0.50005 probability]
Later Round
(6) LSU over (3) Wisconsin [0.51 probability]
(5) Houston over (4) Illinois [0.55 probability]
(5) Iowa over (4) Providence [0.61 probability]
(5) Connecticut over (4) Arkansas [0.52 probability]
(3) Tennessee over (2) Villanova [0.54 probability] & (1) Arizona [0.54 probability]
(3) Texas Tech over (2) Duke [0.56 probability] (Good riddance Coach K)