Managers like Girardi now use numbers, but not the right way
Joe Girardi is using Phil Hughes twice in Arlington based on past performance
Hughes had pitched only 15 career innings in Texas in the regular season
Statistics show outcome, not process, which matters more with small sample size
Watch any Yankee game, and you'll see a shot of manager Joe Girardi in the dugout, and not far from him will be a binder the size of David Robertson's postseason ERA. That binder contains staggering amounts of data on every player in that day's game. In an era obsessed with statistics, the more advanced the better, Girardi has been very open about his willingness to mine that data for information he can use to make decisions about who plays and when.
It's an open question, however, whether he's doing it correctly. Because while looking past RBIs and batting average and looking into matchups are first steps along the road to good decision-making, the trap that many managers fall into is not recognizing that it takes a lot of matchups, a lot of plate appearances, for information to become significant.
So when Girardi sets his ALCS rotation in any part to take advantage of Phil Hughes' track record in Arlington -- where the right-hander had tossed 15 shutout innings coming into last week's Game 2 -- he's misusing the data. Fifteen innings do not provide enough meaningful information about a player's abilities in any situation to be the basis for action. The words "small sample size" are cited so often that they've lost their ability to get people's attention, but the reality of baseball information is that it takes a lot of time for individual events to add up to collective meaning. When analyst Russell Carleton of the web site Statistically Speaking dug into the numbers, he found that statistics became reliable at various intervals, but that at minimum, you needed 50 plate appearances at the low end (for a player's swing percentage) and that some statistics, such as batting average, weren't reliable over even a full season of play.
Girardi's decision to start Hughes in Game 2 made him the Yankee starter in line to get two starts in Arlington, the second of which comes Friday with the Yankee season on the line. Those 15 shutout innings, though, came against teams dramatically different than the one he'll face. On May 1, 2007, Hughes went up against Michael Young, Ian Kinsler and a Nelson Cruz so unready that he would spend most of the summer back at Triple-A. There were as many players in that lineup now out of organized baseball -- three -- as will be on the field tomorrow. Beating up Kenny Lofton, Brad Wilkerson and Victor Diaz in 2007 should not inform a decision in 2010. Even Hughes' 2009 start in Texas, just 17 months ago, shows how much things change: Just five of the players he faced in that game will be on the field tomorrow night. That's not enough similarity of opposition to determine whether Hughes' past performance is indicative of future results.
Such curious decisions have been a pattern for Girardi throughout the series. In Game 4 on Tuesday night, he let Lance Berkman -- who has lost his limited ability to bat right-handed -- bat against Darren Oliver as the tying run with two outs in the eighth inning and attempted to justify the decision by saying "[Berkman] had had some success." The problem is, Girardi is relying on information that has no value. Yes, Berkman had gone 4-for-6 against Oliver, 4-for-7 if you include an at-bat in Game 2 of the series. (Because of how this data is collected, postseason games are often not part of the matchup data, even though at these levels it can account for 25 percent or more of the head-to-head matchups.) But there is no significance at all to seven at-bats; you may as well make the decision based the two players' uniform numbers. The seven at-bats certainly don't outweigh Berkman's failure to hit from the right side over his last 200 plate appearances. In modern baseball, individual matchups almost never occur often enough to achieve statistical significance.
There are other, less mathy problems to be had here. Take that Berkman/Oliver matchup. Prior to Saturday night, the last time the two had faced each other was in 2007. Berkman's last hit off Oliver was in 2003, when Berkman was at the peak of his career and Oliver was nearing the end of the starting-pitcher phase of his before reinventing himself as an effective set-up man. The two players who faced off Tuesday night at Yankee Stadium bore little resemblance to the two who created the four hits that led Girardi to allow Berkman to bat. Many, many batter/pitcher matchup decisions are based on this kind of data, where it stretches so far back as to be about completely different baseball players.
Moreover, statistics show outcomes, rather than process, and when you're talking about small samples and individual matchups, the process matters more. Without going to video, I would have no idea if those four Berkman hits off Oliver were scalded or blooped. Was he fisting 0-2 sliders into right field? Was he getting to 3-1 and squaring up cripple fastballs? "4-for-6" hides more information than it reveals; what we want to know -- does this batter hit this pitcher well? -- requires greater granularity of data: line-drive rates, contact rates, pitch counts. At this level, overall skill sets matter more than the results of a handful of matchups, and even if I wanted the latter, I would get more use out of knowing what a trained observer had seen in those plate appearances than I would from knowing "4-for-6".
Information is neutral. "Lance Berkman is 4-for-8 against Darren Oliver" is a fact. What we do with that fact, however, is most decidedly not neutral. Data-centric decision-making is only helpful if you're using the right data in the right ways, and given everything we know about the variability of baseball statistics, imbuing small samples with meaning to make important decisions is the wrong way.