Wednesday, April 7, 2021

A Simple model that earned a Silver medal in predicting the results of the NCAAW tournament

This year I decided to join the March Machine Learning Mania 2021 - NCAAW challenge on Kaggle. It proposes to predict the outcome of each game into the basketball NCAAW tournament, which is a tournament for women at college level. Participants can assign a probability to each outcome and they're ranked on the leaderboard according to the accuracy of their prediction. One of the most attractive elements of the challenge is that the leaderboard is updated after each game throughout the tournament.

Since I have limited knowledge of basketball I decided to use a minimalistic model:
  • It uses three features that are easy to interpret: seed, percentage of victories, and the average score of each team.
  • It is based on linear Linear Regression, and it's tuned to predict extreme probability values only for games that are easy to predict.
The following visualizations give insight into how the model estimates the winning probability in a game between two teams:

Surprisingly, this model ranked 46th out of 451 submissions, placing itself in the top 11% of the leaderboard and earning a silver medal!

The notebook with the solution and some more charts can be found here.