Sports Stats for Nerds
Moderators: Shirley, Sabo, brian, rass, DaveInSeattle
Sports Stats for Nerds
I figured we could use a thread for some egghead discussion about advanced stats.
I have been thinking a bit about expected results/wins in the NCAA tournament by seed. We all know that by seed, a 1 seed should make the Final Four, a 2 seed the Elite Eight, etc. We also know that realistically, the actual expected values won't match those. For example, only about 40% of 1 seeds actually make the Final Four, so the expected # of wins should probably be a bit less than 4.
So, I found some numbers of win % by round by seed and saw some interesting patterns. By typical expected value calculations, you can just add up the measured percentage that make each round to get expected wins. For example, the expected wins for a 1 seed comes out to 3.36, which seems about right.
But then I hit some curious numbers. Using this method, the expected wins for 10 and 11 seeds are 0.64 and 0.60, even though they only win their first game about 38% and 37% of the time. The higher expected win totals happen because some of these teams win 2, 3, or even 4 games. But those expected win numbers being over .5 give me pause. If a 10 seed should advance only about 38% of the team, it seems counter intuitive that the expected wins values imply you're more likely to get 1 win than 0.
So, what's a better way to measure expected results? Look at median wins by seed? Something else? Any thoughts?
I have been thinking a bit about expected results/wins in the NCAA tournament by seed. We all know that by seed, a 1 seed should make the Final Four, a 2 seed the Elite Eight, etc. We also know that realistically, the actual expected values won't match those. For example, only about 40% of 1 seeds actually make the Final Four, so the expected # of wins should probably be a bit less than 4.
So, I found some numbers of win % by round by seed and saw some interesting patterns. By typical expected value calculations, you can just add up the measured percentage that make each round to get expected wins. For example, the expected wins for a 1 seed comes out to 3.36, which seems about right.
But then I hit some curious numbers. Using this method, the expected wins for 10 and 11 seeds are 0.64 and 0.60, even though they only win their first game about 38% and 37% of the time. The higher expected win totals happen because some of these teams win 2, 3, or even 4 games. But those expected win numbers being over .5 give me pause. If a 10 seed should advance only about 38% of the team, it seems counter intuitive that the expected wins values imply you're more likely to get 1 win than 0.
So, what's a better way to measure expected results? Look at median wins by seed? Something else? Any thoughts?
Totally Kafkaesque
Re: Sports Stats for Nerds
Here, someone else calculated expected wins the same way I did. They also included the standard deviations. http://bracketodds.cs.illinois.edu/seedadv.html
It's also interesting to note that 10 and 11 seeds do better than 9 seeds. 12s are pretty close too.
It's also interesting to note that 10 and 11 seeds do better than 9 seeds. 12s are pretty close too.
Totally Kafkaesque
Re: Sports Stats for Nerds
I put something like this together a few years back to show how shitty Jay Wright is in the tourney. (Didn't age well!) The 8/9 vs 10/11/12 thing makes sense just because of the matchup with the 1, so without reseeding you probably shouldn't adjust that out. The 0.64 thing makes sense too ... 0.38 would be the 1st round xW and then an overall ~25% chance of winning from there on out.
Re: Sports Stats for Nerds
Yeah, it makes sense in terms of pure expected wins/value. But, to use your example, say Jay Wright is coaching a 10 seed and his team loses in the first round. That's not a bad result; it's the expected result over 60% of the time. But an interpretation of a 0.60 expected win value says that a loss is more bad than a single win is good. So it doesn't seem like the right way to judge the performance of teams/coaches.mister d wrote: ↑Tue Mar 13, 2018 8:41 am I put something like this together a few years back to show how shitty Jay Wright is in the tourney. (Didn't age well!) The 8/9 vs 10/11/12 thing makes sense just because of the matchup with the 1, so without reseeding you probably shouldn't adjust that out. The 0.64 thing makes sense too ... 0.38 would be the 1st round xW and then an overall ~25% chance of winning from there on out.
Totally Kafkaesque
Re: Sports Stats for Nerds
Yeah, I don't think you can really use it as a single season projection or measuring stick. Without factoring in sample size, you can probably use W-xW to conclude John Giannini is the greatest tournament coach of all-time. But ... there's also data that strongly suggests 10s are historically where underseeds happen ...
- DSafetyGuy
- The Dude
- Posts: 8866
- Joined: Mon Mar 18, 2013 12:29 pm
- Location: Behind the high school
Re: Sports Stats for Nerds
How do you adjust for the quality of the teams who are playing as opposed to the seed number alone? There's a world of complaining about who is in/out every year with comparatively little complaining about the seedings.
For example, last year, Wichita State was a seven-seed in their regional. Kenpom had them as the sixth-best team in the country prior to the tournament. St. Mary's was also a seven-seed, but Pomeroy had them as #14 in the nation.
For example, last year, Wichita State was a seven-seed in their regional. Kenpom had them as the sixth-best team in the country prior to the tournament. St. Mary's was also a seven-seed, but Pomeroy had them as #14 in the nation.
“The running, the jumping... a celebration of life.”
Re: Sports Stats for Nerds
I think there are probably two ways to evaluate tournament performance. The simpler one is to just use seeds, as I am trying. The other way, more accurate but way harder, is to use a proper rating system, like Pomeroy or Sagarin and evaluate each game separately. I suspect that over a large enough sample size, the seed method will be nearly as accurate, because seeding mistakes happen both ways.DSafetyGuy wrote: ↑Tue Mar 13, 2018 9:23 am How do you adjust for the quality of the teams who are playing as opposed to the seed number alone? There's a world of complaining about who is in/out every year with comparatively little complaining about the seedings.
For example, last year, Wichita State was a seven-seed in their regional. Kenpom had them as the sixth-best team in the country prior to the tournament. St. Mary's was also a seven-seed, but Pomeroy had them as #14 in the nation.
Totally Kafkaesque
- Steve of phpBB
- The Dude
- Posts: 8664
- Joined: Mon Mar 11, 2013 10:44 am
- Location: Feeling gravity's pull
Re: Sports Stats for Nerds
How legit is it to base projections like this on something subjective like seeding? It seems to me that the seeding process involves so much judgment and guesswork by the folks doing the seeding that it really isn't much of an objective measure of anything.
Maybe a better question - how reliable or variable are the expected win projections? Are the results fairly consistent or are they all over the map? (This is one of the many times that my last statistics class was in 1985.)
Maybe a better question - how reliable or variable are the expected win projections? Are the results fairly consistent or are they all over the map? (This is one of the many times that my last statistics class was in 1985.)
And his one problem is he didn’t go to Russia that night because he had extracurricular activities, and they froze to death.
Re: Sports Stats for Nerds
I wouldn't be positive seeding mistakes are evenly distributed. The committee has biases at an entity level (even if members turn over) assuming they use similar tools and follow their own historical trends year to year. Like it would seem completely illogical that the higher surviving seed in a 1st round matchup would have a worse historical record than the lower seed in the next round but the 7/10 has that. #7s are .309 in the 2nd round (.295 against #2s) while #10s are .451 (.391). Sooo, circling back, a #10 losing to the #7 isn't worse than winning is good, but you can project that atleast one #10 has been underseeded to the point they're a serious threat to run through the #2.
- Steve of phpBB
- The Dude
- Posts: 8664
- Joined: Mon Mar 11, 2013 10:44 am
- Location: Feeling gravity's pull
Re: Sports Stats for Nerds
I think I understand what you're saying, but can you clarify what you think are the implications of seeding mistakes not being evenly distributed?mister d wrote: ↑Tue Mar 13, 2018 9:46 am I wouldn't be positive seeding mistakes are evenly distributed. The committee has biases at an entity level (even if members turn over) assuming they use similar tools and follow their own historical trends year to year. Like it would seem completely illogical that the higher surviving seed in a 1st round matchup would have a worse historical record than the lower seed in the next round but the 7/10 has that. #7s are .309 in the 2nd round (.295 against #2s) while #10s are .451 (.391). Sooo, circling back, a #10 losing to the #7 isn't worse than winning is good, but you can project that atleast one #10 has been underseeded to the point they're a serious threat to run through the #2.
And his one problem is he didn’t go to Russia that night because he had extracurricular activities, and they froze to death.
Re: Sports Stats for Nerds
Either accidental, where the committee might view a certain type of team (late risers, teams with mediocre records but missing a star player who is now back) as "#10s" or intentional, where they either invert their 7/10 or intentionally create a high risk 5/12 matchup knowing that particular game draws tons of attention.
- Pruitt
- The Dude
- Posts: 18105
- Joined: Tue Jun 04, 2013 10:02 am
- Location: North Shore of Lake Ontario
Re: Sports Stats for Nerds
Larry Fitzgerald may retire
But as great as he is and has been, this stat is astounding:
But as great as he is and has been, this stat is astounding:
Incredible when you first read it, and it gets more incredible the more it sinks in.Even more mind-boggling is that on 2,263 targets, the veteran wideout has only dropped 29 passes. He has more career tackles (39) than drops.
"beautiful, with an exotic-yet-familiar facial structure and an arresting gaze."
Re: Sports Stats for Nerds
Ha, I completely forgot about this thread. I started it just a few days before UVA's historic first-round loss. And I'm sure I was thinking at the time about expected tournament results due to Tony Bennett's reputation (unfair I thought and think) as an underachiever in the tournament. I'm pretty sure I dropped this plan after the UMBC loss!
Totally Kafkaesque
Re: Sports Stats for Nerds
In only one season where Tom Brady started more than one game did his team fail to make the playoffs. And that was 2002 when the Pats went 9-7 and he led the league in TD passes.
Canadian International