Behind the numbers
Which 2007 stats projection system is right for you?
Posted: Monday February 19, 2007 3:40PM; Updated: Monday February 19, 2007 3:43PM
With football season over, and news that pitchers and catchers have reported for spring training, the internet forums and blogs are a-buzz with how their favorite players and teams will do in the upcoming season. In a sense, February is "projection season" for the world of baseball fandom.
For your convenience, FanGraphs has compiled four different projection systems: The 2007 Bill James Handbook projections, Dan Szymborski "ZiPS," Sean Smith's "CHONE," and the Marcel the Monkey Forecasting System. With all four in one place, you can easily see the differences in each system for any particular player. But first, let's explore some of the nuances of each system.
Tom Tango, the keeper of the Marcel the Monkey Forecasting System, has stated that the Marcels are "the minimum level of competence that you should expect from any forecaster." That's why they've been aptly named after a monkey. They simply use three years of weighted data, regression toward the mean and an age adjustment formula. You could, in fact, recalculate them yourself since the method is completely open source and available at tangotiger.net.
It should be noted that the Marcels only project players that have already played in the majors. Any player that hasn't played in the majors yet, is assumed to have a league average projection.
The Bill James Handbook
There are really two different projection systems at work here: the batter projections and the pitcher projections. Bill James did the batter projections and wrote in the 2006 version of the Handbook: "What a player has done in the past, we predict he will do in the future, modified slightly by age, playing time and park effect."
As for the pitcher projections, James was not involved in any of them and in fact does not believe they can be done. They were instead done by the Baseball Info Solutions team consisting of John Dewan, Pat Quinn, and Damon Lichtenwalner.
To be brief, the pitching projections were based on the past eight years of a player's stats with heavier emphasis on the past three years. Playing time was based on the pitcher's role in the last two months of the most current season. Some age adjustment was used and DIPS (Defense Independent Pitching Statistics) theory was taken into consideration. Minor-league stats were used with the help of Ron Shandler's Minor League Equivalency system. There are a lot of other minute details in the pitching system you can read about in a rather entertaining and informative FAQ located in the 2006 Bill James Handbook.
Smith is the brainchild behind this system, often writes about his efforts on his site, Anaheim Angels all the way. His batter projections are based on four years of weighted data, regression toward the mean, and custom age curves based on player type. Playing time is then adjusted to make things "reasonable."
In his pitching projections he used batted ball data (fly balls, groundballs, etc...) to predict batting average on balls in play and home runs. He used his own Major League Equivalency system for minor league statistics. Playing time for pitchers is based on the 5 most likely starters and 6 most likely relievers from which innings pitched are then doled out.
Dan Szymborski of BaseballThinkFactory.org puts these out annually. They're based on three or four years of weighted data depending on a player's age and he uses various "growth and decline" curves based on the type of player.
"I don't try to find particularly similar players but instead large groups with similar characteristics, such as K rate for pitchers, Speed Score for batters, [batting average on balls in play] BABIP for batters, handedness, and a lot of other stuff."
Pitching projections do take DIPS theory into account by not only regressing BABIP toward the mean but also by taking into account handedness, knuckleballs, and groundball-to-fly ball ratios.
1 of 2