Calibration
When the model says X%, does the favored team win X% of the time? The vertical bar shows the predicted probability; the filled bar shows actual.
By Confidence Tier
Accuracy when the model is highly confident in its pick.
| Confidence | Games | Accuracy |
|---|---|---|
| ≥ 60% | 2325 | 69% |
| ≥ 65% | 2001 | 70% |
| ≥ 70% | 1572 | 73% |
| ≥ 75% | 1227 | 74% |
Score Projection Accuracy
How close were the Monte Carlo score projections to actual results? The model's own projected run differential acts as the spread.
Based on 2980 games with score projections. “Own-line cover rate” = how often the actual margin exceeded the model's projected margin in the predicted direction.
Top Teams by Elo
Rolling Elo rating (1500 = average). Updated after every game result.
| Rank | Team | Games | Elo |
|---|---|---|---|
| 1 | Texas Longhorns | 267 | 1797 |
| 2 | North Carolina Tar Heels | 267 | 1774 |
| 3 | Southern Miss Golden Eagles | 174 | 1765 |
| 4 | Auburn Tigers | 253 | 1757 |
| 5 | UCLA Bruins | 251 | 1751 |
| 6 | LSU Tigers | 283 | 1741 |
| 7 | Oregon State Beavers | 268 | 1738 |
| 8 | Florida State Seminoles | 253 | 1738 |
| 9 | Florida Gators | 282 | 1736 |
| 10 | Mississippi State Bulldogs | 248 | 1735 |
D1 Diamond Top 25 — 2026 Season
Composite ranking of all D1 teams with ≥5 games played.
| # | Team | Record | Elo | Run Diff | Score |
|---|---|---|---|---|---|
| 1 | TEX | 16–0 | 1797 | +7.6 | 94 |
| 2 | GT | 15–2 | 1704 | +9.5 | 89 |
| 3 | UGA | 15–3 | 1709 | +8.4 | 87 |
| 4 | TA&M | 15–1 | 1726 | +7.2 | 87 |
| 5 | MSST | 15–2 | 1735 | +6.8 | 85 |
| 6 | AUB | 14–2 | 1757 | +5.1 | 84 |
| 7 | CLEM | 15–2 | 1734 | +5.2 | 84 |
| 8 | OU | 14–2 | 1734 | +6.4 | 84 |
| 9 | UNC | 15–3 | 1774 | +5.8 | 83 |
| 10 | MISS | 15–3 | 1719 | +4.9 | 83 |
| 11 | FLA | 15–3 | 1736 | +4.1 | 82 |
| 12 | FSU | 13–3 | 1738 | +4.6 | 82 |
| 13 | USM | 15–2 | 1765 | +3.5 | 82 |
| 14 | USC | 15–0 | 1710 | +5.0 | 82 |
| 15 | UCLA | 14–2 | 1751 | +5.8 | 81 |
| 16 | NCST | 14–3 | 1673 | +7.9 | 80 |
| 17 | LSU | 13–5 | 1741 | +3.9 | 79 |
| 18 | WAKE | 15–2 | 1705 | +4.7 | 79 |
| 19 | UK | 15–2 | 1692 | +4.6 | 78 |
| 20 | ALA | 15–3 | 1722 | +4.1 | 78 |
| 21 | ORE | 13–3 | 1703 | +5.7 | 78 |
| 22 | UVA | 14–3 | 1709 | +5.1 | 77 |
| 23 | ORST | 11–4 | 1738 | +1.8 | 76 |
| 24 | TENN | 13–4 | 1717 | +3.4 | 76 |
| 25 | UCSB | 13–2 | 1682 | +4.1 | 76 |
Minimum 5 games played. wOBA/FIP from FanGraphs.
How to Read This
Overall Accuracy — % of games where the model's favored team won. A coin flip is 50%; sharp sports models typically land 58–65% on college baseball.
Brier Score — measures probability calibration quality. Lower is better. A random model scores 0.25; perfect calibration scores 0.
Log Loss — penalises confident wrong predictions more harshly than Brier score. Lower is better. Random baseline is ~0.693 (ln 2); well-calibrated sports models typically land 0.55–0.62.
AUC — probability that the model ranks a random home win above a random home loss. 0.5 = no skill; 1.0 = perfect ranking. Measures discrimination independently of calibration.
Calibration chart — a perfectly calibrated model's bar would exactly touch the line in every bucket. Bars to the right mean the model is underconfident; left means overconfident.
Score projection — tracks how close the Monte Carlo run totals were to reality. “Avg margin error” is the mean absolute difference between projected and actual run differential. “Own-line cover rate” treats the model's own projected margin as the spread — above 50% means the model tends to under-project winning margins.
Elo ratings — self-correcting power ratings that update after every game. 1500 is average; top programs typically reach 1600–1700 by mid-season.