Wednesday, April 8, 2015

RPI and SoS

Time for my annual rant on RPI and SoS as evaluation tools in the bracketology process.  There's a bit of tl;dr involved, I'll bold the important statements throughout.

People always misunderstand the original purpose of RPI (the selection committee included).  The RPI was never meant to be more than a blunt object instrument, an approximator of teams' worth.  The problem is, once people see the number, they expect it to be definitive.  The RPI was never meant to be definitive.  It's the public's fault for misinterpreting what RPI was supposed to mean.  I feel like I'm doing a public service announcement every year when I say this.

We must get out of the business of using RPI as a whole, IMO.  Even if we de-emphasize a team's RPI, we're emphasizing the RPI of their opponents.  We look at record vs. top 50, vs. top 100, etc.  By using the RPI as a grouping tool, we're still subjecting the process to approximations.  It would be much better to come up with a measurement that has more of a sliding scale impact.  Wins against great teams worth more than against bad teams, but instead of just putting them in a column, we figure out a sliding scale to assign values to each win.  The public is too willing to just blindly look at the record vs. Top 50 and assume it's an ironclad statement on a team's worth.

I've heard people complain that RPI is flawed because 75% of the formula is based on who you play instead of your own record.  This is actually mathematically incorrect.  Yes, only 25% of the formula counts your record, 50% counts your SoS, and 25% counts your opponents' SoS.  People look at the percentages and think the impact of the first category and third category are the same.  They're not.

Let's look at the numbers more closely.
1) 25% of the RPI is a team's winning percentage.  A perfect team scores an even .2500 in this metric (hi, Kentucky).  A great team (Iowa St at 25-8)  A good team, like, for example, Providence (22-11) scores around .1667 here.  A bad power conference team (let's say Washington St at 13-18) scores around .1049.  The difference between the greatest team and worst team is .2500, and the difference between a generally good team and bad team is around .0400 and .0700.

2) 50% of the RPI is a team's strength of schedule.  The #1 SoS in the country is Kansas, and they're credited with .3135 in the formula.  The #351 SoS in the country is Alabama St's.  They get .1928 in the formula.  The difference between the greatest team and worst team is .1207.

The #75 SoS is LaSalle, who gets credit for .2724 in the formula.  #225 SoS (Lamar) gets credit for .2378 in the formula.  Therefore, the difference between a generally good SoS and a generally bad SoS is around .0350.  Therefore, SoS has less overall impact on the RPI than a team's record.

3) 25% of the RPI is the opponent's strength of schedule.  It should be abduntantly clear right off the bat what will happen between the best and worst teams in this category.  Since every team on a schedule has their SoS averaged in with everyone else, there simply isn't much difference between a good team and a bad team.  This is where playing in a great conference or bad conference gives you a small advantage/disadvantage, but for the most part, the impact this metric has on the overall RPI is negligible.

With that out of the way, let's do look at SoS in deeper detail.  There are 351 D1 teams.  There are many good teams, but I think we can come to a consensus in saying the bottom 150-200 teams are not good teams compared to the top 100 or so, and are more or less equal.  Now, obviously some teams from 201+ RPI are better than others, but let's say you're Notre Dame, or Iowa St, or Kentucky.  You'd be expected to beat every team ranked 201 and above, and there's not much difference for you if you play RPI 201 or RPI 351.  For most good teams, and even for most bubble teams, there just isn't much difference in teams, once you reach the lower third of D1 basketball.

Here's the problem with SoS - there IS a big difference between RPI 201 and RPI 351 when it comes to the numbers.  Here's an example to illustrate the point.  Dartmouth was 14-14 this year.  So their SoS hit is actually decent - they're .500.  San Jose St was 0-for-everything against D1 this year, so their SoS hit is catastrophic - they're .000.  If you're a top 15 team, you're beating both Dartmouth and San Jose State handily.  However, according to the RPI, there's an enormous gulf of difference between playing Dartmouth and SJSU.  In fact, if you're, say, Iowa St...the difference between playing Kentucky and Dartmouth this year is the EXACT same as the difference between playing Dartmouth and SJSU.  On the court, the difference between UK and Dartmouth is very large, and the difference between Dartmouth and SJSU is smaller.  Off the court, the RPI treats the differences as equal.  That's a problem.

The end result of this effect is this:  it's more important to avoid really bad teams than it is to play good teams.  There's two elements that go into creating a good schedule - scheduling good teams, and avoiding bad ones.  The RPI forces teams to overemphasize bad team avoidance more than getting good teams.  The end effect is that a team has more incentive to play as many good-but-not-great teams as possible.  For example, playing several teams that are just above .500 is more important.  If you schedule many of those opponents, you can build a really good SoS without actually playing a top 25 team.  And if you play a couple top 25 teams, you can actually remove all the benefits of it by playing a couple of bad teams.

Look at Notre Dame.  They played Michigan St, UMass, Purdue, Providence.  Not the greatest schedule, but not awful.  However, their non-con SoS was 319.  Why?  Binghamton (RPI 332), Coppin St (311), Grambling (351), Chicago St (333), FDU (312) destroyed their average.  The bad team effect ruined them.

Compare Notre Dame to Clemson.  Their toughest 4 games in the non-con were LSU, Arkansas, South Carolina, High Point.  Weaker than UND's, for sure.  We can agree on that.  However, their non-con SoS is 187.  Why?  They played FAMU (RPI 350) and Nevada (301), but everyone else was inside the RPI Top 210.  Winthrop, Gardner-Webb, Oakland, Rutgers, all weren't awful hits like UND's cupcakes were.  I think we agree that both Clemson and UND should've handled all teams on their non-con schedule outside the top 4, but since Clemson got two of the teams that contended for the Big South title, and a Horizon contender, instead of teams that went to the basement in their leagues, their SoS is over 120 spots better.

The solution to this effect?  Another sliding scale implementation.  We must find a mathematical way to limit the amount of  damage a single bad team can do to an SoS.  And we must find a way to mathematically award teams for playing the best of the best.  Right now RPI is a linearly scaled metric, with the distance between a perfect team and .500 team being the same between a .500 team and a winless team.  Right now teams are more concerned with bad team avoidance and scheduling a bunch of decent teams, instead of just playing better teams and not worrying about the impact of the worst teams.  RPI and SoS are emphasizing the wrong parts of a team's resume.  We need to adjust the formula.

Tuesday, April 7, 2015


Time for my annual CBI/CIT suggestion post.  I won't say much, if you look in my archives to last year you can see a similar rant.

My bottom line is this:  we need a 3rd postseason tournament, behind the NIT.  There's many good teams who deserve a postseason who don't make the NIT.  Just this year, teams like Yale didn't make the NIT, and this does allow most if not all of 2nd and 3rd place finishers in conferences the chance to play in postseason.  I think we can come into agreement that these tournaments are perfect to reward teams that just came short of winning their smaller conference.

However, we don't need 2 of them.  We need one of them, the CBI or CIT, and not both.  Right now these two tourneys add 48 postseason teams, to stretch to 148 across all D1.  That's too many.  I'd rather have 16 or 24 added teams, instead of 48, to narrow it down to, say, 116 postseason teams across all D1 (almost a perfect 1-in-3 ratio).

My preference:  kill the CBI.  Power conference teams almost always decline it, and I'd be okay with keeping power conference teams out of the CIT after they get rejected by the NIT.  They don't need the CIT anyways.

The CIT is a celebration of good mid-major basketball, and should continue.  I'm not sure if they need 32 teams though.  If it stays at 32, I'm ok with it, but 16 or 24 is fine too.  There should always be room for teams like this year's Yale, Chattanooga, Cleveland St, Georgia Southern, et al.  All those Big South contenders this year?  More than 2 of them deserve a postseason (and they got them this year).  All those MAC contenders?  They deserve more.  And so on and so on.  Let's tighten the fat, so that we don't have to admit a lousy Colorado team or marginal mid-majors.  We can have the best of both worlds here.

Rules ideas

Everyone wants to propose changes to "fix" the game of college basketball.  However, the sport doesn't need widespread changes.  Just band-aids.  Here is my personal modest proposal to fix the issue of low scoring and too many breaks.

1) Shot clock to 30 seconds, from 35 seconds - no brainer.
2) Media timeouts every 5 minutes instead of 4 minutes - instead of 4 media timeouts per half, you get 3.  To compensate for the lost commercials in the 4th media timeout, extend each of the 3 media timeouts by 30 seconds.  This has the added side-benefit of baiting coaches to more often call timeout during a game instead of hoarding them all for the final 2 minutes.
3) Ban calling timeouts after a made basket - the vast majority of late-game timeouts happen once a team makes a basket and then calls timeout, in order to set their press defense.  Just simply ban it.  Once the ball is in the hoop, it isn't yours, and you shouldn't be able to call timeout.

There you go.  3 simple fixes to help speed up the game.  It's not hard, NCAA.

Monday, April 6, 2015

A random poll

This is the only time of the year when the USA Today poll actually matters.  There's a reason we avoid all mentions of rankings through the regular season.  But now the final post-tourney one kinda matters, as a matter of record more than anything.

This is an unbiased opinion on what the final poll should read.  If you disagree you're a moron

1) Duke
2) Wisconsin
3) Kentucky
4) Arizona
5) Notre Dame
6) Gonzaga
7) Michigan St
8) Villanova
9) Virginia
10) North Carolina
11) Oklahoma
12) Louisville
13) Wichita St
14) Kansas
15) Utah
16) Northern Iowa
16) Iowa St
17) Baylor
18) West Virginia
19) Maryland
20) North Carolina St
21) Xavier
22) Georgetown
23) Butler
24) Arkansas
25) UCLA

Thursday, April 2, 2015

The NCAA fixed its regional site issue!

I was all set to complain about how the NCAA chooses its regional sites.  This year, we had sites in Portland and Seattle, right next to each other.  Then we had Pittsburgh, Columbus, and Louisville in close proximity.  This led to severe under-representation in the south, and an unhealthy clumping in the bracket of top teams in the great lakes region.

So, naturally, I was ready to rant and pick apart the 2016 regional assignments, but...crap.  They figured it out.  Mostly, at least.  Your 8 sites:

Providence (northeast/New England area)
Brooklyn (northeast)
Raleigh (your standard North Carolina-based regional)
St Louis (midwest)
Des Moines (midwest)
Oklahoma City (midwest, leaning towards the south)
Denver (mountain time zone)
Spokane (northwest)

This is reasonably balanced.  I would probably prefer the Des Moines or St Louis site to be closer to Indiana or Ohio...and maybe a southern site near Georgia/Florida in place of one of the northeast sites.  But overall, this will serve the most teams well.  In particular, we've moved away from having 2/3 sites in short proximity.  Well, I guess St Louis and Des Moines are kinda close but they'll be serving different teams.

In an ideal world, my setup would go as follows:

1) a northeastern site (Providence, Hartford, Boston, New York, Syracuse, etc.)
2) a mid-eastern site (D.C., Baltimore, Philly, Richmond, etc.  Perhaps Pittsburgh or Cleveland, if the Great Lakes site is closer to Milwaukee)
3) a southeastern site (anywhere in Florida, anywhere in North Carolina.  In fact, alternate between the two states.  This might overlap with the mideastern site in #2, be careful)
4) a southern site (Knoxville, Birmingham, New Orleans, St Louis, Louisville, etc.)
5) a Great Lakes site (Chicago, Milwaukee, Detroit, Columbus, etc)
6) a true midwestern site (Omaha, OKC, anywhere in Texas.  If you do choose Texas, make sure the southern site at #4 is closer to Knoxville or Louisville)
7 and 8) any two of the following:  a California site (LA, Oakland, San Diego), a mountain zone site (Albuquerque, Denver, SLC), and a northwest site (Seattle, Portland)

This would provide the greatest balance and serve the most teams.

We're actually in really good shape, because in 2017, these are the actual sites:
1) Buffalo (northeastern site)
2) Greensboro (your NC-based/mideast based regional)
3) Orlando (southeast site)
4) Indianapolis (great Lakes)
5) Milwaukee (midwestern, although trending northern)
6) Tulsa (another midwestern site, but more southern)
7) Salt Lake City
8) Sacramento

2017 actually follows my script well.

So the NCAA is slowly figuring out how to assign regionals with this whole pod system.  Good.