So you want to compare the all-time greats but have limited historical data? Not sure what to make of players before the Databall era, when plus-minus wasn’t available and, depending on how far back you go, the box score wasn’t even complete? Don’t worry, you’ve come to the right place.
In the first post in this series, we looked at WOWY data – a simple concept that isolates a player in order to gauge his value by using game-by-game results. But we were left with a fundamental problem — how do we estimate impact for players who weren’t injured or traded? It’s clear Jerry West provided significant lift, but what about someone like Bill Russell? How do we measure his non-box impact?
The answer? Regression, the same statistical method used on play-by-play data in the last few years to create irreplaceable impact metrics like RAPM (Regularized Adjusted Plus-Minus). Since comprehensive play-by-play does not exist before the late ’90’s, WOWYR instead regresses WOWY data, or game-by-game plus-minus data. (It’s better than WOWY, so it’s “wowier,” and stands for “With or Without You Regressed.”)
Evidence Beyond Injuries
WOWY score is almost always predicated on injuries, isolating lineups with-and-without players. But there’s far more evidence in the data beyond that.
First, there is indirect evidence for a player when his teammates leave the lineup. Let’s say we wanted to know how much Scottie Pippen contributed to the Bulls +9 point-differential in the early ’90’s. In 1994, when Michael Jordan left the Bulls, we could infer something about Pippen based on the change caused by Jordan’s absence. How?
If Jordan left and the team remained a +9 team, then it would be fairly safe to infer that Jordan was not the reason the Bulls were +9…which tells us that key remaining players on the team, like Pippen and Horace Grant, were the ones responsible for the large point differential.
Conversely, if Jordan left the Bulls and they unraveled into a -5 team, not only does that say amazing things about MJ but it would mean that the players left behind, like Pippen and Grant, weren’t integral to that +9 differential. Thus, we can make inferences about other players, even when they don’t leave the game-by-game lineup. So while Bill Russell didn’t miss as much time as Jerry West, there’s a bevy of evidence about Russell left by his teammates and all of the time that they miss over the years.
Similarly, when two players leave the lineup, it’s not a pure WOWY instance. But again, we can gain valuable insight here too: If the combination of two players leaving caused a team to fall apart, then we can infer that (a) one of those players was making the team excel, or (b) both of them were. Even though it’s unclear who caused it, it’s yet another piece of evidence that can be incorporated with direct and indirect game-by-game information about a player.
Indeed, regression parses all of these scenarios and provides an answer to how much different players impact the game. The result is a single, points-per-game value that estimates a player’s impact over multiple years.
Much like the first generation of adjusted plus-minus stats used Ordinary Least Squares (OLS) regression, so does version 1.0 of WOWYR. Follow-up versions will refine the method, but I wanted to start with OLS both for simplicity and so we see standard errors for each player; a smaller standard error indicates less variability in the player’s estimate.
Below are the WOWYR values for every player from 1954-1983 who played at least 450 games. All data is from basketball-reference, however their data changes slightly after 1983, so I’ll be incorporating 1984-present in a future post.
Keep in mind that the only data fed into this statistical model is (a) who played in a game and (b) what the score of that game was. Yet, more than half of the MVP’s claimed during the time period fall in the top-11. A search of similar criteria for the time period produces a list of similar All-NBAers. Pretty cool, eh?
Despite filtering for players with a good five seasons or more of playing time, there are still instances where multicollinearity and uncertain rear their ugly heads. Fortunately, some of these ambiguities — for instance, the early 80’s 76ers, Bucks, Spurs and Sonics — will be ironed out when more seasons are added. Other players, like Walt Frazier, just have a fuzzier signal than most.
Detailed methodology for OLS WOWYR can be found below. I’ll go into more detail in the next post in this series when WOWYR is improved.
- ~25+ mpg to qualify for a lineup (aka ignore lower-minute players)
- Exceptions: if 5th-highest minute player is below 25 mpg, will often take that player and any other in the same mpg range (usually 23-24 mpg) to complete the “lineup.”
- Take the average, unadjusted strength of schedule (based on the full 82-game SRS value of a team)
- Add a home-court advantage factor (3 points)
- All postseason data is included
- Players who fail to qualify for more than 82 games worth of lineups are treated as a “replacement rotational” player
- “Prime” and “non-prime” seasons are treated as separate players — more on this in the next post.
- Technically, this is Weighted Least Squares (WLS), where weights are determined by games played (using the standard square-inverse of the variance).
- Regression is performed on all lineups and their point differentials.