I Let a Machine Fill Out My Bracket, And Here's What It Learned

Every March, I fill out a bracket with the same strategy: gut feeling, a vague memory of who looked good in February, and the unshakeable belief that this is finally the year a 16-seed makes a deep run. It never works. So this year I tried something different. I built a machine learning model to do it for me, borrowing heavily from the people who do this for real using Claude Code, of course.

Standing on Kaggle's Shoulders

The approach here isn't original. Kaggle runs the March Machine Learning Mania competition every year, challenging data scientists to predict tournament outcomes. I built my model on the foundation laid by Jared Cross's 1st place solution from the 2024 competition, along with the Nate Silver/538 methodology (power rating differential divided by 11, normal CDF) as a sanity check baseline. The competition datasets and the winning approaches are all public, which is what makes a project like this possible for someone who runs a school district by day and trains models by night.

The Data

I pulled 66 datasets covering 18 years of NCAA tournament history. KenPom efficiency ratings, Barttorvik metrics, team resumes, ELO rankings, quad records, shooting splits, coaching histories, conference stats. If someone tracks it, I downloaded it.

From those raw numbers, the model extracts 45+ features for every team: adjusted offensive and defensive efficiency, Dean Oliver's four factors (effective field goal percentage, turnover rate, offensive rebound rate, free throw rate), shooting versatility, defensive pressure, talent ratings, experience, even average height. Then it engineers 10 composite features on top of that, things like quality win percentage and ball security scores.

For every historical tournament game since 2008, the model computes the difference in each of those features between the two teams. That's the training data: 1,070 games where we know who won and by how much.

The Model

A single algorithm wasn't going to cut it. The final model is a stacked ensemble, four base classifiers (logistic regression, gradient boosting, random forest, and histogram gradient boosting) feeding into a meta-learner that weighs their predictions. Think of it as a committee of statisticians who each see the data differently, with a fifth statistician deciding who to trust on any given matchup.

The model hit 73.6% accuracy on a held-out test set. For context, picking the higher seed every game gets you roughly 65%. So the model is finding real signal in the efficiency data beyond what seed lines already tell you.

The single most predictive feature? Barttorvik's adjusted efficiency margin, and it wasn't close. That metric alone carried nearly twice the weight of the next most important feature. Wins Above Bubble, defensive four factors, and assist rate rounded out the top five. Seed difference, the thing most casual bracket-pickers anchor on, mattered less than you'd think.

The Bracket

Once trained, the model does two things. First, it picks a deterministic bracket: for every possible matchup, it takes the team with the higher win probability. Second, it runs 10,000 Monte Carlo simulations, randomly sampling outcomes based on those probabilities, to generate championship odds for every team.

The results:

- Duke: 31.2%

- Michigan: 25.2%

- Arizona: 16.0%

- Houston: 10.8%

- Florida: 4.8%

Those top four teams account for over 83% of simulated championships. The model is not subtle about the talent gap between the top tier and everyone else this year.

Tennessee, for what it's worth, comes in at 0.2%. I'm choosing to interpret that as "mathematically possible."

The Technical Detour

Here's the part they don't mention in the tutorials. I built this whole thing locally, 1,254 lines of Python, and it wouldn't run. My M3 Max locked up trying to fit the stacked ensemble. This is the second time in two days a machine learning project has tried to kill my laptop (the TN school letter grades analysis did the same thing with XGBoost hyperparameter tuning).

The fix was simple: upload everything to Google Drive and run it in Colab. Free cloud compute, all the sklearn dependencies pre-installed, no kernel panics. The whole pipeline, data loading, training, 10,000 bracket simulations, HTML generation, ran in a couple of minutes.

If you're doing ML work on a Mac and things start freezing, don't fight it. Just move to Colab. Your laptop will thank you.

What I Actually Learned

The model confirmed something I already suspected: efficiency margins are the whole game. Not record, not conference strength, not recruiting rankings. How many points you score per possession versus how many you allow. That's it. Everything else is noise or a downstream effect of that core metric.

It also reminded me that 73.6% accuracy means the model is wrong more than one game in four. March Madness is chaotic by design. Single-elimination tournaments reward variance, and no amount of feature engineering will predict the kid who hits a half-court buzzer-beater.

But that's the fun of it. The model gives you a framework, a set of informed probabilities. What you do with those probabilities is still up to you.

Jason's Blog:

Featured posts:

I Let a Machine Fill Out My Bracket