Friday, March 21, 2014

A note on RPI

Some final thoughts on how RPI is used.

The selection committee has done a good job of ignoring individual teams' RPI in the selection process.  However, the RPI still matters when it comes to assembling lists of records vs. RPI Top 50, Top 100, and so forth.  If a metric is not good enough to be used for an individual team, but is good enough to be used to group said teams, isn't that contradictory? 

This is why I like to look at average RPI win and average RPI loss a bit more.  This helps balance out any imbalance in the numbers.  Beating a team twice with an RPI of 51 is fundamentally different than beating a team twice with an RPI of 100, but using average RPI win as a stat is the only way to get that to show up in the data, without looking at the actual list of results.

Which brings me to a fundamental issue with RPI and "bad losses" and "good wins". 

Let's take 4 teams.  Let's say Team A is undefeated, 1.000 winning percentage (30-0) and team B is a good .800 team (24-6).  Team C is a bad .300 team (9-21) and Team D is a really bad .050 team (0-20).  Records are uneven, but whatever, this is an illustrative example.

According to the way RPI is calculated, the RPI sees the difference between Teams A and B as being dead equal as the difference between C and D.  However, the difference in beating Team A against B is big.  Now, beating either Team A or B would be a signature win, but one is more signature than the other.

Now look at what happens with a win against Team C or D.  In either case, the public perception of the team doesn't change.  They beat a bad team.  However, the RPI sees a difference in beating the two teams, the same difference it would see between Teams A and B.

Let's say Team C has a 225 RPI and Team D has a 350 RPI (reasonable).  From public perception, the difference in wins is negligible, and a loss against C is just as harmful as a loss against D.  But according to the RPI's perception, the difference between C and D is large.

And therein lies the problem.  The public perception says any win over a team outside the top 150 is mostly useless in evaluation.  However, from the RPI formula's point of view, there's a big difference between a win over a RPI 175 team and a RPI 325 team.  This results in distorted RPIs that punish teams far too much for playing bad teams and doesn't reward enough for teams who play great teams

What the RPI needs is a weighted adjustment.  There shouldn't be a big difference between playing a RPI 225 and a RPI 350 team.  We should scale down the effect really bad teams have on RPIs compared to the merely below-average.  Similarly, we should be able to scale up wins against terrific teams.  Right now teams benefit more from avoiding bad teams than scheduling good teams.  We need to emphasize scheduling great opponents, while de-emphasizing the need to purge every single cupcake from the schedule.

This is something I hope someone takes a look at.  What happens if you replace every horrible team on SMU's schedule with, say, the RPI 225 team?  Take the 6 or so horrible teams, replace them with merely bad teams, give SMU easy wins in all of them...what happens to their SoS and RPI?  Do they make it in the tourney?  Perhaps.  And yet, SMU would have ended up with the exact same on-court results against either schedule.

Given the expansion of D-1 in recent years, it's worth exploring ways to minimizing penalties for playing the worst of the worst.  Non-con scheduling should be about finding the best games, not avoiding the worst games.  The emphasis point needs to change.

No comments: