David Roher is co-president of the Harvard Sports Analysis Collective. See his other work at the group's web site. This is his latest installment of a new series of blog posts about sports statistics and strategy. Follow him on Twitter at @davidroher.
The Red Sox are the near-unanimous winners of the offseason after adding the two biggest offensive prizes, Adrian Gonzalez and Carl Crawford, to their 2011 lineup. After an injury-riddled 2010, fans will be happy to see a lineup that looks radically different from the last one they remember.
This got me thinking: is there a way to quantify how much a roster has changed from one year to the next? Boston is in its 12th decade of American League baseball, and 1612 different hitters have stepped into the batter’s box for the Red Sox, and 759 pitchers have toed the rubber. That’s enough data to find some trends. I downloaded roster information from the Lahman database and analyzed Red Sox roster turnover from 1901, the founding of the American League, to 2010.
At first, my inclination was simply to look at the names alone. I’d be asking the question, “how many players who were on the roster the year before are on it again this year?” But this method has a critical problem: it doesn’t take into account which players saw the most action. I doubt many of you are shedding any tears over the loss of Ryan Shealy (seven plate appearances last year), but the departure of Adrian Beltre would sting quite a bit if not for the addition of Gonzalez. That said, I’m not concerned with win production, either. The “look and feel” of the lineup, rotation, and bullpen is what matters.
That’s why I chose to look at plate appearances and innings pitched. The question I’m actually asking is, “what percentage of plate appearances came from players who were on the roster the year before?” I call this number the “Familiarity Index.” Granted, this isn’t perfect either, as a player could be a scrub one year and a starter the next. There are more advanced ways to tackle the problem, such as player-to-player variance in PA and IP totals, but they’re more complicated, less intuitive, and no more indicative of any trends, as I found out after trying them.
Here are the graphs of my findings over the course of the history of the franchise. The red line represents the fraction of plate appearances that came from players on the roster in the previous year. The blue line represents the same information for pitchers and innings pitched.
Last year’s team had a 76 percent pitching familiarity but a hitting familiarity of 51 percent, the lowest since 1995. Surprisingly, despite the changes in civil rights, player movement, and business over the course of baseball’s history, time period has no apparent relationship with roster turnover. The average Red Sox season sees 73 percent of plate appearances from the previous year’s roster, and 72 percent of innings pitched. The bigger jumps for pitchers are less surprising, given their relatively shorter shelf lives. The least familiar roster was 1946, in which just 14 percent of plate appearances and 25 percent of innings came from players on the roster in 1945. Ironically, this unfamiliarity was likely due to the return of familiar players from World War II, such as Ted Williams. The most familiar year was 1917, in which over 95 percent of both plate appearances and innings came from the 1916 group.
These years, despite their disparate totals, were similarly successful: The Red Sox came in second in the American League in 1917 and won the pennant in 1946. Does any of this have an impact on (or, more accurately, a relationship with) winning? Decide for yourself: here’s a table of how many games the Red Sox have won on average (over a 162-game season) when finishing within certain ranges of familiarity indices.
|Hitter Familiarity Index||Average Record|
|Below 50% (11 teams)||78-84|
|Pitcher Familiarity Index||Average Record|
|Below 50% (10 teams)||78-84|
Using statistical modeling, I found that for each extra percent of plate appearances that come from the old roster, the Red Sox win about half a game (.486 wins) more over the course of a 162-game season. Innings pitched have no relationship to winning, positive or negative, nor do extremes: the success of the 1946 team is an outlier in that respect.
This doesn’t mean that newcomers to the Red Sox lineup will be bad news in 2011. For one thing, correlation does not equal causation – that is, the familiarity is by no means explicitly causing more wins. But even if that were true, this lineup will probably be more familiar than you’d think, as Crawford and Gonzalez will likely be the only two Opening Day starters not on the 2010 squad.
A relationship between roster consistency and success is hardly shocking, but actually finding one in the numbers shows that this topic is worth a closer look. Is the same true for other franchises? Are there other, multi-year methods that might lead to even stronger results? I’ll likely be looking at these questions over the next few months, and I encourage you to do the same if you’re so inclined.