The potential impact of stats on the playoff selection committee
This week, a committee of commissioners and athletic directors is hunkered down in a conference room in Indianapolis selecting the field of 68 teams for this year's NCAA basketball tournament. As usual, the bracket it unveils on Sunday night will likely be met with controversy.
Recently, there's been increasing criticism surrounding the committee's continued reliance on the 32-year-old RPI ratings as the primary measuring stick for teams' accomplishments (records vs. RPI top 50, top 100, etc.). The relatively simplistic formula consists of a team's winning percentage (25 percent), its opponents' winning percentage (50 percent) and its opponents' opponents' winning percentage (25 percent). With the advent of more advanced metrics like Ken Pomeroy's efficiency ratings, among others, many feel the NCAA is rigidly clinging to an archaic system.
Meanwhile, next month in Pasadena, Calif., FBS commissioners will begin constructing their own playoff committee, one that will select its first four-team football field in December 2014. Unlike in basketball, college football has a chance to start from scratch with its methodology. Executive director Bill Hancock said the commissioners are keen on a "data-driven" approach. But he added, "No single ranking could ever be the end-all, be-all for college football. There are just not enough data points."
So far, the commissioners have talked largely in generalities about the committee's directive. We know, for example, that there will be an emphasis on strength of schedule. But what's the best way to measure that? And given that both the current BCS formula and the RPI ratings exclude margin of victory -- a universally derided flaw by statisticians -- will the committee use more mathematically sound ratings systems in the future?
|Who will make up the committee?|
SI.com sought the recommendations of five noted college football and basketball stats gurus to get a better feel for how the 2014 committee should embark on its task. They are:
• Bill Connelly, author of SB Nation's Football Study Hall and Football Outsiders' S&P efficiency ratings
• Ed Feng, publisher of ThePowerRank.com and Sports Illustrated contributor
• Brian Fremeau, author of BCFToys.com and Football Outsiders' FEI efficiency ratings
• Jerry Palm, CBSSports.com bracketologist and former publisher of CollegeBCS.com and CollegeRPI.com
• Ken Pomeroy, publisher of the KenPom.com basketball efficiency ratings
While each offered their own unique perspectives, their responses coalesced around three primary themes:
"Transparency is important to the commissioners," said Hancock. "Coming out and explaining their selections." That's encouraging, but some say that openness should perhaps begin even sooner.
• Palm: "The most important thing that needs to be included is accountability. Whatever formula they choose, it needs to be open so it can be independently validated, and it needs to be explainable, so that people know how they are being judged. If a formula starts incorporating some higher math elements in an effort to be more accurate, they run the risk of not being able to truly explain what's going on under the covers."
• Fremeau: "The primary thing the committee needs to do is establish a consistent process. I've advocated for a requirement that committee members take the time to retroactively select and seed 20 years of four-team fields. Evaluate their own individual priorities and those of the whole committee through that process. Establish consistency over time."
• Pomeroy: "I don't think there's any right or wrong answers here, but whatever is decided it's important to bracket it on prior seasons, and make those results public, to see if it makes sense."
• Fremeau: "I doubt this will happen, but I think they need to have a non-voting data person in the room as well. Someone to help the members interpret ratings and other data sources, answer questions that are posed and hold the group accountable to information that is shared."
Since 1936, college football's national champion has been determined almost entirely by how few games it lost. Pollsters rarely slot a one-loss team above an undefeated team, unless the undefeated team is from a widely perceived weaker conference. Meanwhile, the current BCS formula does not include margin of victory, a fairly universal method for determining team strength. The experts agree that the committee needs to look beyond mere W's and L's.
• Fremeau: "Wins and losses are only one piece of the puzzle in evaluating teams. How each game was played -- scoring margins, drive and play efficiency, impact on the outcome of less-predictable events like turnovers and special teams -- can be measured. The committee can add in context that a computer cannot, like injuries. All of this could be valuable in making decisions about which teams are in the field."
• Feng: "The most simple metric is margin of victory. Better teams tend to win by more points. There's a strong correlation between average margin of victory and winning percentage. I know this is a political problem, since the committee does not want to encourage running up the score. However, we've seen the downside of not using margin of victory. The current BCS computer polls were not allowed to [incorporate MOV], and they had Notre Dame ranked higher than Alabama before this year's national title game."
• Palm: "There are three things that are necessary in a football metric that aren't universally accounted for now: margin of victory, home field advantage and games played outside [Division] I-A. There are too few games played to allow for the exclusion of MOV. Also, I think MOV is a reasonably accurate indicator of how the teams performed in a football game (except in overtime games). ... And playing [Division] I-AA teams is a choice. That choice needs to be reflected somehow in the strength of a team's schedule."
• Fremeau: "Frankly, I think the committee needs as much data as possible in the room. The drive efficiency work I do and the play-by-play work Bill Connelly does are just a small part of a broad palette of measures available. Use them all."
• Connelly: "We tend to drastically overstate the importance of head-to-head matchups. ... If we've seen two teams play on the same field, we are naturally going to take the result of that battle heavily into account. But that's one week. Teams play 12 games. If Team A loses to Team B (especially by a small margin) but clearly outplays Team B in the other 11 weeks of the season (according to records, eyeballs, and/or computer rankings), Team A deserves a playoff spot over Team B."
Of course, the danger of using advanced stats or ignoring head-to-head results is the committee might wind up producing a bracket that the majority of the public -- accustomed to seeing rankings ordered largely by team records -- rejects.
• Pomeroy: "It probably seems straightforward that you'd want the four 'best' teams in the playoff, but this isn't really true because we've generally seen that close losses to good teams represent team strength more than a close win against a bad team. Fans, media and players wouldn't really want to see a three-loss team that played a very tough schedule get in over an unbeaten that played a weaker schedule."
Schools like Ohio State are already ramping up future schedules in anticipation of the new system -- but no one has yet to say how strength of schedule will be determined. From 1998-2003, the BCS standings included a SOS component that mimicked the RPI's aforementioned formula. The NCAA uses a simplistic listing of opponents' cumulative records, while Jeff Sagarin's ratings are widely cited and may well prove valuable for the committee.
The tricky thing, according to our panel of experts, is that the truest ratings of schedule strength involve some pretty intense math. Read carefully.
• Fremeau: "There are many ways to measure schedule strength, and many of them are valid. I like to use this example. Imagine two schedules. Schedule A consists of the six best teams in the country and the six worst. Schedule B consists of the 12 most average teams in the country. Which is tougher? Ask Alabama, and they'll obviously say Schedule A. Alabama would have a much easier time running the table against Schedule B. But ask the worst team in the country which one is easier, and they'll say the opposite. The worst team in the country would have a hell of a time winning a single game against Schedule B. ... So depending on who you are, you can perceive the exact same schedule of teams very differently."
• Connelly: "If we have to judge a team simply on strength of schedule, then I like the way Fremeau approaches it. He basically asks how other teams of a certain level would expect to perform given the same schedule. You need stats more advanced than simply win percentages to do that well, but from a 20,000-foot view, that's the right approach."
• Fremeau: "My approach is to calculate the likelihood of an elite team (defined as two standard deviations better than average) going undefeated against the entire schedule. (Click here for FEI SOS.) That's not the only way to measure strength of schedule, and it may not be the best. But it is specific and consistently applied, and that matters most."
• Feng: "Calculating strength of schedule first depends on taking a team's raw stats and accurately adjusting them for their schedule. From these adjusted numbers from each team, one can determine a team's strength of schedule by looking at the adjusted numbers of their opponents.
"Many people make a 'one-step' adjustment for a team's opponents. For example, let's say Alabama has a raw margin of victory of 12.0. Its tough schedule makes for an adjustment to 14.0. The problem is, Alabama's MOV has changed, so you should really do the calculation again for all teams. Most people stop after one step. ... It's better to solve for these variables simultaneously. This requires knowing some linear algebra. But linear algebra is hard, so most people don't do this. ... I was looking around at some college basketball rankings, and I found someone that does a good job using margin of victory and solid math."
• Pomeroy: "You don't necessarily need to consider schedule strength directly since it would be calculated by whatever [rating] system(s) you create ... Now, if you really need a schedule strength ranking, I'd recommend taking the best five to 10 systems out there and average their schedule strength estimate."
Pomeroy may be on to something there. Given that the prospective committee members aren't likely to be up on their linear algebra, and given that fans might not forgive a 10-3 team being elevated over a 12-1 team due to its hypothetical chances of going undefeated, it may be best to form a simple consensus. But then again, the BCS already did this to some extent, throwing together six random computer ratings and expecting fans to accept their legitimacy.
Attendees at past BCS meetings usually include a handful of highly paid lawyers and TV and p.r. consultants. If they want to assemble this committee correctly, they ought to add a statistician or two to the payroll.