Backpicks GOAT: Philosophy and Science of Player Evaluation

GOAT lists in sports are fascinating. Criteria varies from person to person and are often fuzzy. No two rankings ever seem to be the same. And because some of us are married to our rankings, most new information is met with a backfire effect — a knee-jerk reaction to push back against novel data and contradictory evidence.

There’s almost no way around this, which is why I wanted the Backpicks GOAT to run with an extremely specific criteria — career value measured in CORP — and present video, historical data and theory for you to update your own rankings, even if CORP isn’t your thing. A CORP-based approach can also reveal some of the philosophical judgment calls we make, even without realizing them. For me, finally connecting player quality with CORP shifted my career-value rankings meaningfully, and in some cases, radically.

Take Tracy McGrady (strong peak) vs. Reggie Miller (strong longevity). I always thought Miller was underrated, but never thought he had a top-40 career. After actually calculating title odds, it was clear that I skewed too hard toward higher peaks. My intuition was that MVP years were worth about three times more than a fringe All-NBA season, and that really great peaks were worth five or six times more. But not only was that an overestimation, but CORP research demonstrated how valuable “second options” are too.1 To my surprise, Miller moved up nearly 15 spots while McGrady moved down (and off this list).

I’ll discuss longevity more in the series post-mortem, but for now I wanted to catalogue precisely how I evaluate players beyond the summary in the top-40 overview. At its heart, this series is really about player evaluations, connecting court to spreadsheet and documenting salient data about a player’s impact. But how do I do that? What’s my science (and art) that produces a player valuation?

Conceptually, the major skill sets in the graphic below (and some minor ones) lay the foundation for a player’s impact on offense (and on defense, for his ability to negate them). I also apply themes that have emerged from historical research over the years, like the tendency of big men to have larger defensive impact, or that on-ball scoring, while incredibly important, doesn’t outweigh other core skills like defense and passing. Quantifying someone’s actual, in-era impact, and translating it into career value happens in four steps for me.

Step 1: Film Study

Many people call this the eye test, but there are really two distinct elements to this:

First, what we consciously perceive when watching film. I have a good eye for some phenomena, but I miss others. If I watch a play with a sequence of passes that leads to a wide open shot, and I don’t immediately know who forced that action on offense and who made what decisions on defense, I missed something fundamental. Thankfully, we live in a digital world and can rewind to the beginning of the play with the click of a button. I do this on a lot of plays. (By the way, this is a good way to discover which color analysts have great eyes — they’ll tell you in realtime how a play materialized. Weaker analysts simply gawk at what happened on-ball.)

The other “eye test” component is just as important and is discussed extensively in Thinking Basketball: Your memory! Recalling dozens of actions in every game is impossible for most people. To combat my glitchy mental hard drive, I write it all down. I have a note-keeping system which I translate into a spreadsheet, allowing for both qual and quant analysis. For this series, I took over 500 pages of game notes and here’s what I learned: Comprehensively studying one player in the same game requires focus, watching two is mentally taxing and watching three or four at once is nearly impossible. And you won’t remember stuff from a few games ago.

So if you’re eye-testing games by ball-watching and then relying on memory, you’re going to miss out on areas that traditional metrics struggle to capture, namely passing and team defense. Not coincidentally, most people take umbrage with players I value differently on defense, and secondarily think I overrate good passers who were lesser scorers.

Practices: (1) Re-watch plays. (2) Takes notes to remember them. (3) Track passing, creation, on-ball habits, defensive errors, defensive rotations and man defense.

Step 2: Contextual tendencies

This is largely informed by combining the box score (and now, play-by-play or optical data) with film. For instance, once we know the context of James Worthy’s offense — mid-post isolation, beneficiary of layups from Magic Johnson — we can make inferences about his efficiency and passing, especially if we have data on him playing with and without Magic. When these metrics shift in different roles, it reveals strengths and weaknesses in a player’s game. (e.g. Wade’s creation with and without LeBron, or Curry’s percentage of assisted 3s with and without Durant.)

During the series, I reference the Big 3 or Big 4 box dimensions.2 These metrics give players a general “box profile” and lay the foundation for value-comparisons on offense only. Passing (separate from creation) cannot always be captured in these stats, nor can scalability. In similar contexts, small differences in these metrics are negligible. But it’s rare for an A-list player to have notably inferior metrics across the board and be a better offensive player.

Practices: (1) Look for tendencies in box data (eg unassisted shots, free throw rate, efficiency next to varying teammates) and then (2) compare normalized box profiles to players with similar roles in various environments. Look for changes against different opponents / playoff defenses. 

Step 3: Impact Measurements

So-called impact metrics — anything related to plus-minus — are concerned with changes in the scoreboard, and a player’s influence on those changes. In theory, these measurements are everything, the holy grail of player quality. But they are limited by sample size, confounding variables (like the dreaded multicollinearity problem) and most importantly, because they are conditional, measuring the value of a player only on his particular team.

I’m using the word measurement here consciously, because that’s how I’ve come to view these stats. Field goal percentage is a measurement of the shots a player takes (influenced by context and role) and adjusted plus-minus (APM) is a measurement of a player’s correlation with the scoreboard (also influenced by context and role). For high-minute players, APM is often reflective of a player’s situational value, so consistency (or changes) with different teammates can expose strengths and weaknesses, especially when mapped to trends from the first two steps above.

Practices: Compare teams results and plus-minus data across different environments, look for patterns/results with different teammates/coaches.

Step 4: Valuations

At this point, we know how a player’s counting stats vary in different roles, what his strengths/weaknesses/tendencies are and (roughly) how valuable he’s been in various settings. But coaching also needs to be accounted for. Coaching influences how successful a player is in his role and thus, influences his statistical profile. When it’s all laid out, I’ll mentally adjust a player’s portfolio based on coaching.

The easiest comparison is a player to himself, so I weigh his surrounding seasons against each other. While a player’s value on a team can vary greatly from year to year, his actual quality — the theoretical average of all his values from every team — is generally much smoother, governed by aging and health. In other words, I’ll view prime seasons as fairly similar unless there’s a clear reason to think otherwise. This is antithetical to looking at a playoff stat line, dolling out huge marks for a monster series or two, then considering the next season a total failure because the same player fired blanks in five playoff games.

The final step is to put it all together and quantify these seasons on a scale. The scale ranges from (roughly) replacement player to (roughly) best player ever. It’s not absolute, because both ends of the spectrum are technically fluid; a new greatest player can come along and extend the scale. Remember, a player whose net impact on the scoreboard is zero is actually pretty good, and a team of such players would usually be a .500 team! Here’s the scale based on real-world APM data, where the all-time best results likely came from favorable situations.3

I then quantify players on a per-game scale (much like plus-minus) on both sides of the ball. For instance, Player A is +3 points on offense and +2 on defense on an average team.4 It’s important to break up both sides of the ball, because fit is different from team-to-team; most clubs need defenders, others need a panacea for their fledgling attack. I classify offense into five levels of portability (scalability), where the most portable players carry more value on better and better teams.

Finally, that plus-minus number and portability score is fed into a calculator that estimates a player’s CORP in a given season based on his games played. Every season is summed together to create a career CORP estimation. (This is the number included in each player’s seasonal valuation at the end of his profile.) It’s not always perfect — even five scaling curves is a bit “chunky” — so I examine the results more closely to see if any small nuances compound over a career, adjust for era longevity and season strength, and cringe when two guys end up with nearly identical values.

And that’s it.

  1. Full explanations for why are in this CORP article.
  2. The nomenclature is an homage to a classic psychological concept known as the Big 5 traits. There are really five main box categories historically: (1) Scoring rate, (2) creation rate (3) scoring efficiency (4) turnover rate and (5) offensive rebounding rate.
  3. More on this idea: If someone’s “true” average impact is +5 points per game on all teams, then on teams where he’s more unique he might be worth 6 or 7 points, and on teams where he’s more redundant, might be +3 or +4. Additionally, there’s some error in the measurement — it’s precision is not always, say, within 0.2 points. Thus, it’s statistically likely for these APM outliers to come from more favorable situations.
  4. Comparing seasons along this scale is the most revealing part of historical player rankings. It exposes internal inconsistencies and forces reevaluations — once I threw enough seasons on the board, I found indefensible gaps between certain players. My intuition had been cheating me (surprise).

One thought on “Backpicks GOAT: Philosophy and Science of Player Evaluation

Leave a Reply

Your email address will not be published. Required fields are marked *

*