How well do ranking models predict regular-season game outcomes?
Fraction of games where the predicted favourite won
higher is betterMean squared error of probability predictions (lower is better)
lower is betterCross-entropy loss of probability predictions (lower is better)
lower is betterEach point is a 5% probability bucket. On the diagonal = perfectly calibrated. Above = model underestimates; below = overestimates.