Phil
Stats Geek
- Joined
- Aug 25, 2011
- Messages
- 4,462
- Reaction Score
- 5,840
One of the Val Ackerman proposals is to switch to a double-bye format (similar to the way the old Big East Tournament worked.) I did some mathematical modeling for Val Ackerman to demonstrate the impact of the switch. I've copied below a revised version of my letter to her. (The table doesn't show up here, see this Google Doc to see the revised letter with the table).
I would love to have some feedback, both on the concept itself, and on how best to make the case.
__________________________________________________________________
Dear Val Ackermann,
As discussed, I promised to do some mathematical modeling to estimate the impact of a double-bye format compared to the traditional format. I have some preliminary conclusions.
The main differences:
Impact of switching to a double-bye format:
I’ll summarize these results in a table:
[See link to see the table]
These are dramatic differences. There are differences that apply to the other seeds as well, but the differences are smaller.
I’ve made a number of key assumptions to produce these results. I think they are reasonable, for the purposes of illustrating how dramatic the differences are, but the assumptions should be refined before using these numbers more formally. In particular, one of the key variables is the standard deviations of margins. I haven’t found a solid source for this value, but some sources suggest a value of about 10. I’ve used this value, but would like to find a better study to support whatever value is best. A related assumption, almost certainly imperfect, is stationarity, that is, the standard deviation can be treated as a constant. It is likely to vary based upon the seeding pairs, however, the broad conclusions are unlikely to be materially changed with refinement of this assumption.
While I do have some statistical experience—I am an actuary and former member of the Board of Directors of the Casualty Actuarial Society—I do want to run my results by someone with more extensive experience. I have already received some preliminary comments from Stuart Klugman, who is not only a statistical expert, he literally wrote the book.[2] He has agreed to help refine the model, if needed.
I hope you will find these results helpful, and if so, I will undertake to improve the model so that the assumptions are more rigorous.
Sincerely,
Stephen W. Philbrick
[1] We all know that a 16 seed has advanced. But we also know that this was a seeding anomaly. Stanford earned a one seed, in no small part due to the contributions of All-Americans Nygaard and Folkl. Both of the players were injured after the regular season ended, but before the NCAA tournament and were not able to play. Had they been uninjured, they likely would have won. While injuries are a part of the game, the loss of two All-American between the end of the regular season and the beginning of the tournament is an extremely unlikely event.
[2] SeeLoss Models: From Data to Decisions, by Stuart A. Klugman , Harry H. Panjer , and Gordon E. Willmot
I would love to have some feedback, both on the concept itself, and on how best to make the case.
__________________________________________________________________
Dear Val Ackermann,
As discussed, I promised to do some mathematical modeling to estimate the impact of a double-bye format compared to the traditional format. I have some preliminary conclusions.
The main differences:
- Under a double-bye format, the chance that the lowest seeds can advance is materially improved
- Under a double-bye format, the margins of victory are likely to be materially smaller, thus reducing the number of games that are blow-outs, sometimes by halftime
Impact of switching to a double-bye format:
- Under the traditional format (TF), the chance that a 16 seed will advance is under 1 in 10,000. Given four such matchups in any year, we could expect to wait about 3500 years before seeing a 16 seed advance (at present levels of parity)[1]
- In contrast, under a double-bye format (DBF), we can expect to see a 16 seed advance every five years or so.
- Under TF, we can expect a 15 seed to advance every seven years or so.
- Under DBF, we can expect one almost every year.
- Under TF, we expect the 16 seeds to lose by almost 40 points. While some will be closer, some will have even higher margins.
- Under DBF, the 16 seed is still expected to lose, but by 16 points or so, which will lead to a loss in most cases, but means somegames will be competitive late in the game.
- Under DBF, the 15 seed expected margin drops from 18 to 8, which means many games will have single digit margins late in the game.
I’ll summarize these results in a table:
[See link to see the table]
These are dramatic differences. There are differences that apply to the other seeds as well, but the differences are smaller.
I’ve made a number of key assumptions to produce these results. I think they are reasonable, for the purposes of illustrating how dramatic the differences are, but the assumptions should be refined before using these numbers more formally. In particular, one of the key variables is the standard deviations of margins. I haven’t found a solid source for this value, but some sources suggest a value of about 10. I’ve used this value, but would like to find a better study to support whatever value is best. A related assumption, almost certainly imperfect, is stationarity, that is, the standard deviation can be treated as a constant. It is likely to vary based upon the seeding pairs, however, the broad conclusions are unlikely to be materially changed with refinement of this assumption.
While I do have some statistical experience—I am an actuary and former member of the Board of Directors of the Casualty Actuarial Society—I do want to run my results by someone with more extensive experience. I have already received some preliminary comments from Stuart Klugman, who is not only a statistical expert, he literally wrote the book.[2] He has agreed to help refine the model, if needed.
I hope you will find these results helpful, and if so, I will undertake to improve the model so that the assumptions are more rigorous.
Sincerely,
Stephen W. Philbrick
[1] We all know that a 16 seed has advanced. But we also know that this was a seeding anomaly. Stanford earned a one seed, in no small part due to the contributions of All-Americans Nygaard and Folkl. Both of the players were injured after the regular season ended, but before the NCAA tournament and were not able to play. Had they been uninjured, they likely would have won. While injuries are a part of the game, the loss of two All-American between the end of the regular season and the beginning of the tournament is an extremely unlikely event.
[2] SeeLoss Models: From Data to Decisions, by Stuart A. Klugman , Harry H. Panjer , and Gordon E. Willmot