Sports Stats for Nerds

Post by **Shirley** » Tue Mar 13, 2018 8:21 am

I figured we could use a thread for some egghead discussion about advanced stats.

I have been thinking a bit about expected results/wins in the NCAA tournament by seed. We all know that by seed, a 1 seed should make the Final Four, a 2 seed the Elite Eight, etc. We also know that realistically, the actual expected values won't match those. For example, only about 40% of 1 seeds actually make the Final Four, so the expected # of wins should probably be a bit less than 4.

So, I found some numbers of win % by round by seed and saw some interesting patterns. By typical expected value calculations, you can just add up the measured percentage that make each round to get expected wins. For example, the expected wins for a 1 seed comes out to 3.36, which seems about right.

But then I hit some curious numbers. Using this method, the expected wins for 10 and 11 seeds are 0.64 and 0.60, even though they only win their first game about 38% and 37% of the time. The higher expected win totals happen because some of these teams win 2, 3, or even 4 games. But those expected win numbers being over .5 give me pause. If a 10 seed should advance only about 38% of the team, it seems counter intuitive that the expected wins values imply you're more likely to get 1 win than 0.

So, what's a better way to measure expected results? Look at median wins by seed? Something else? Any thoughts?

Post by **Shirley** » Tue Mar 13, 2018 8:26 am

Here, someone else calculated expected wins the same way I did. They also included the standard deviations. http://bracketodds.cs.illinois.edu/seedadv.html

It's also interesting to note that 10 and 11 seeds do better than 9 seeds. 12s are pretty close too.

mister d · Post by **mister d** » Tue Mar 13, 2018 8:41 am

I put something like this together a few years back to show how shitty Jay Wright is in the tourney. (Didn't age well!) The 8/9 vs 10/11/12 thing makes sense just because of the matchup with the 1, so without reseeding you probably shouldn't adjust that out. The 0.64 thing makes sense too ... 0.38 would be the 1st round xW and then an overall ~25% chance of winning from there on out.

Post by **Shirley** » Tue Mar 13, 2018 8:46 am

mister d wrote: ↑Tue Mar 13, 2018 8:41 am I put something like this together a few years back to show how shitty Jay Wright is in the tourney. (Didn't age well!) The 8/9 vs 10/11/12 thing makes sense just because of the matchup with the 1, so without reseeding you probably shouldn't adjust that out. The 0.64 thing makes sense too ... 0.38 would be the 1st round xW and then an overall ~25% chance of winning from there on out.

Yeah, it makes sense in terms of pure expected wins/value. But, to use your example, say Jay Wright is coaching a 10 seed and his team loses in the first round. That's not a bad result; it's the expected result over 60% of the time. But an interpretation of a 0.60 expected win value says that a loss is more bad than a single win is good. So it doesn't seem like the right way to judge the performance of teams/coaches.

mister d · Post by **mister d** » Tue Mar 13, 2018 9:18 am

Yeah, I don't think you can really use it as a single season projection or measuring stick. Without factoring in sample size, you can probably use W-xW to conclude John Giannini is the greatest tournament coach of all-time. But ... there's also data that strongly suggests 10s are historically where underseeds happen ...

mister d · Post by **mister d** » Tue Mar 13, 2018 9:19 am

7/10: .614

2/7: .705
2/3: .650
2/10: .609

DSafetyGuy · Post by **DSafetyGuy** » Tue Mar 13, 2018 9:23 am

How do you adjust for the quality of the teams who are playing as opposed to the seed number alone? There's a world of complaining about who is in/out every year with comparatively little complaining about the seedings.

For example, last year, Wichita State was a seven-seed in their regional. Kenpom had them as the sixth-best team in the country prior to the tournament. St. Mary's was also a seven-seed, but Pomeroy had them as #14 in the nation.

Post by **Shirley** » Tue Mar 13, 2018 9:34 am

DSafetyGuy wrote: ↑Tue Mar 13, 2018 9:23 am How do you adjust for the quality of the teams who are playing as opposed to the seed number alone? There's a world of complaining about who is in/out every year with comparatively little complaining about the seedings.

For example, last year, Wichita State was a seven-seed in their regional. Kenpom had them as the sixth-best team in the country prior to the tournament. St. Mary's was also a seven-seed, but Pomeroy had them as #14 in the nation.

I think there are probably two ways to evaluate tournament performance. The simpler one is to just use seeds, as I am trying. The other way, more accurate but way harder, is to use a proper rating system, like Pomeroy or Sagarin and evaluate each game separately. I suspect that over a large enough sample size, the seed method will be nearly as accurate, because seeding mistakes happen both ways.

Steve of phpBB · Post by **Steve of phpBB** » Tue Mar 13, 2018 9:38 am

How legit is it to base projections like this on something subjective like seeding? It seems to me that the seeding process involves so much judgment and guesswork by the folks doing the seeding that it really isn't much of an objective measure of anything.

Maybe a better question - how reliable or variable are the expected win projections? Are the results fairly consistent or are they all over the map? (This is one of the many times that my last statistics class was in 1985.)

mister d · Post by **mister d** » Tue Mar 13, 2018 9:46 am

I wouldn't be positive seeding mistakes are evenly distributed. The committee has biases at an entity level (even if members turn over) assuming they use similar tools and follow their own historical trends year to year. Like it would seem completely illogical that the higher surviving seed in a 1st round matchup would have a worse historical record than the lower seed in the next round but the 7/10 has that. #7s are .309 in the 2nd round (.295 against #2s) while #10s are .451 (.391). Sooo, circling back, a #10 losing to the #7 isn't worse than winning is good, but you can project that atleast one #10 has been underseeded to the point they're a serious threat to run through the #2.

Steve of phpBB · Post by **Steve of phpBB** » Tue Mar 13, 2018 9:52 am

mister d wrote: ↑Tue Mar 13, 2018 9:46 am I wouldn't be positive seeding mistakes are evenly distributed. The committee has biases at an entity level (even if members turn over) assuming they use similar tools and follow their own historical trends year to year. Like it would seem completely illogical that the higher surviving seed in a 1st round matchup would have a worse historical record than the lower seed in the next round but the 7/10 has that. #7s are .309 in the 2nd round (.295 against #2s) while #10s are .451 (.391). Sooo, circling back, a #10 losing to the #7 isn't worse than winning is good, but you can project that atleast one #10 has been underseeded to the point they're a serious threat to run through the #2.

I think I understand what you're saying, but can you clarify what you think are the implications of seeding mistakes not being evenly distributed?

mister d · Post by **mister d** » Tue Mar 13, 2018 10:07 am

Either accidental, where the committee might view a certain type of team (late risers, teams with mediocre records but missing a star player who is now back) as "#10s" or intentional, where they either invert their 7/10 or intentionally create a high risk 5/12 matchup knowing that particular game draws tons of attention.

Pruitt · Post by **Pruitt** » Mon Aug 23, 2021 9:09 am

Larry Fitzgerald may retire

But as great as he is and has been, this stat is astounding:

Even more mind-boggling is that on 2,263 targets, the veteran wideout has only dropped 29 passes. He has more career tackles (39) than drops.

Incredible when you first read it, and it gets more incredible the more it sinks in.

Post by **Shirley** » Mon Aug 23, 2021 12:12 pm

Ha, I completely forgot about this thread. I started it just a few days before UVA's historic first-round loss. And I'm sure I was thinking at the time about expected tournament results due to Tony Bennett's reputation (unfair I thought and think) as an underachiever in the tournament. I'm pretty sure I dropped this plan after the UMBC loss!

Pruitt IV · Post by **Pruitt IV** » Wed Feb 01, 2023 12:59 pm

In only one season where Tom Brady started more than one game did his team fail to make the playoffs. And that was 2002 when the Pats went 9-7 and he led the league in TD passes.

The Swamp

Sports Stats for Nerds

Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds

Re: Sports Stats for Nerds