The Statistical Improbability of Deflate Gate

On Sunday I broke down some of the common misconceptions surrounding the Wells Report, including the social science involved, the statistical misinterpretations and the lack of coherence in the NFL’s story based on its own evidence. Then, on Wednesday provided a time-based visualization of the all the measurements presented in the Report based on where we’d expect them to be at a given time as the balls warmed up in the locker room. Visually, it’s fairly clear that the Colts balls and Patriot balls have similar issues, as many are “under-inflated” by similar degrees. But what does this mean in terms of probability if we actually run some statistical tests on the data?

To reiterate, time is a major variable in this case because the PSI of the balls was increasing with every minute that they were in the locker room at halftime. Thus, the time that each ball was measured becomes critical in trying to analyze the discrepancy between where a ball was measured and where a ball “should” be using Ideal Gas Law parameters. Below is one such scenario presented in the previous post, in which Patriot balls were measured after 3 minutes in the locker room, measuring a ball took 25 seconds, and it took 3.5 minutes to re-inflate the balls. The blue line is where we’d expect a Colt ball to measure given the time indoors and the gold line where we’d expect a Patriot ball to be:

Deflate Gate Logo Scenario

So our parameters for simulating the actual measuring circumstances (assuming the balls were indeed correctly recorded in order, and as Exponent believes they were set to 12.5 and 13.0 PSI respectively in the pre-game) are:

  • Set Up time (2-4 minutes according to accounts)
  • Measurement time (21.8 – 27.3 seconds per ball)
  • Inflation time (2-5 minutes)
  • Packing time (unstated, but assumed to require some small degree of time between last measurement and re-emergence from locker room)

If we use Exponent’s Ideal Gas Law calculations that assumes 71 degrees pre-game — which may be slightly low, as noted in the last post — and add a small “wetness” factor per their report, we can then simulate a bunch of these scenarios to see what was likely and unlikely. The scenario above attempts to average all the accounts. But we can also examine other scenarios — instances where the Patriots balls were tested after 2 minutes or 4 minutes, quickly or slowly re-inflated, etc. If we do that, we’re left with a number of basic permutations we can study:

Deflate Gate p-values

So what do these numbers mean? The “Patriot-Colt mean difference from expected” column calculates where each ball should be based on the time it’s measured, takes the average of all such Patriot balls and subtracts it from the average of all such Colt balls. If we take the mean of all six hypothetical scenarios, the average Patriot ball is about 0.02 PSI below where it should be at the time of measurement relative to the Colts balls. (i.e. using the Colts balls as a control group.) The p-value is the statistical likelihood that the balls come from different populations, i.e. that one set of balls had something done to them that the other set didn’t.

  • The best-case statistical scenario for the Patriots is that Walt Anderson used the Logo Gauge pre-game, that the balls were measured at 2 minutes, each took about 22 seconds to measure and that the officials took 5 minutes to re-inflate the balls (labeled “Early Start, Fast Measure, Long Inflate” above). That produces a mean where the Patriots balls are higher than the Colts, meaning it’s impossible for the Patriots balls to come from a population that is inherently lower than the Colts balls.
  • Three of the six scenarios in which the Logo Gauge was used pre-game completely exonerate the Patriots
  • The worst-case scenario for the Patriots is that Walt Anderson used the Non-Logo Gauge in the pre-game, and that the balls were measured at 4 minutes, each took 27 seconds to measure and that the officials took 2 minutes to re-inflate the balls (labeled “Late Start, Slow Measure, Quick Inflate” above). That produces a p-value of 0.247, which means that if our assumptions are true, there is a 75.3% chance the Patriots balls come from a different population.

Although it’s far below “statistical significance,” 75.3% might sound like a lot. But what is that number actually saying? For that, we have to look at the observed difference in the averages to put this into perspective: there’s a 75% chance that the 0.3 PSI difference is not simply from variance and is part of a different population (i.e. tampered balls).

Depending on the distribution, 0.3 PSI could easily be 99.99% likely to come from a different sample…which would suggest, what? There’s a 99.99% chance that the Patriots systematically released an average of 0.3 PSI per football? And that’s the worst-case scenario? That strains common sense.

In total, the (independent t-tests) results show that it was incredibly unlikely that the Patriots balls behaved any differently from the Colts balls using the assumptions presented in the Wells Report. Additionally, in the small unlikelihood that they are different  — roughly a 15% chance if Anderson used the Logo Gauge in the pre-game and 57% if he used the Non-Logo gauge — the degree to which the balls are different is nonsensically small. We would expect a “small” degree of deflation to be something like 1.0-2.0 PSI; the initial reports were “11 more than 2 PSI below regulation,” with another ball falsely labeled by the NFL itself at 10.1. But the data presents a completely different story — the Patriots balls are sometimes higher than the Colts balls relative to what we’d expect, and the worst-case scenario for New England suggests a “non-significant” likelihood of tampering to a degree that is so small it’s equivalent to the variance seen between the two gauges used to measure the balls.

PS If anyone in the statistics community would like the data used here to perform further modeling, please comment below and I’ll provide it. 

 

Follow-Up: The Evidence for Non-Tampering in 2 Pictures

The last post on the cognitive and statistical biases in Deflate Gate included a visual of the “Logo Gauge” scenario that was added to the post based on the timeline provided on page 70 of the Wells Report. In that post I discussed the problems with the Colt balls, but failed to include a visual for the Non-Logo Scenario. Here I’ve presented both to show why the statistical evidence stands heavily on the non-tampering side of things.

Below are the measurements taken at halftime, both from the “Logo” gauge and the “Non’Logo” gauge. Each line is the expected PSI of the balls (blue for Indianapolis, gold for new England) as they heat up during the locker room period. Again, note the Colt balls:

Deflate Gate Logo Scenario

Deflate Gate Non Logo 12.95

The projections use the following assumptions: The temperature indoors was 71 pre-game, 48 degrees outdoors with an atmospheric pressure of 14.636; The balls were indeed measured in order (this is alluded to but not explicitly stated); It took 3 minutes in the locker room before testing began (an average of the 2-4 min guess made by Exponent); It took 25 seconds to test a ball (Exponent’s 4-5 min estimation for measuring Patriots balls produces a range of 22-27s); It took 3.5 minutes to re-inflate the Patriots balls (again, an average of Exponent’s 2-5 min estimation); It took just over 90 seconds to pack up and leave, using the assumption that packing would take less time than set up;The Patriots balls were all of the same wetness, following a “wet” curve estimated by Exponent’s wet test; The Colts balls were all of the same dryness; Most importantly, that every Patriot ball was exactly 12.5 PSI pre-game and every Colt ball 13.0 PSI pre-game, as claimed by referee Walt Anderson.

 

Notice how the Colts balls are “shifted down” below where they should be in a similar manner to the Patriots balls. It’s likely the Logo gauge scenario reflects natural variance we see in measuring actual game-play footballs, as both Patriot and Colts balls aren’t where we’d “expect” based on the assumed parameters Exponent uses and the Ideal Gas Law. (This variance can come from the operator or from other subtle environment factors not captured by temperature and atmospheric pressure. It can also come from the balls not all being perfectly 12.5 pre-game, as well or the temp not being exactly 71 degrees F.) We can say this for two major reasons:

  1. Some Patriots balls are above where we’d expect them based on a 12.5 PSI pre-game measurement and the Ideal Gas Law
  2. 7/8 Colts balls are also below where we’d expect them based on a 13.0 PSI pre-game measurement and the Ideal Gas Law

The Colts balls are actually the best evidence for the Patriots, as they are the only four other footballs ever measured at halftime of a game and they show a departure from what’s expected despite not being tampered with. (Note: Here I’m not arbitrarily treating Colts ball #3 as a transcription error as Wells does.)

Screen Shot 2015-05-20 at 3.38.52 PM

This essentially exonerates the Patriots in the Non-Logo scenario, which is what Exponent used to reach its conclusion. Because in that scenario, three of four Colts balls are more than 0.5 PSI under the expected range, with one of four about 0.75 PSI below expected. 7/11 Patriot balls were more than 0.5 PSI below expected with 5/11 more than 0.75 PSI under expected. As stated in the last post, Exponent overlooked this because they ignored the variable of time (the balls heating up) and presented all balls as being measured at the same time.

EDIT: Reader “George” astutely noted that simply increasing the indoor temperature pre-game by a degree or three, due to a slight temperature discrepancy between the HVAC and the actual room temperature (from bodies giving off heat in the room) helps explains much of the slightly below-expected values from both teams. In that case, each team’s expected PSI line would be shifted down slightly, helping to explain the results of both sets of balls, as shown below: 

Deflate Gate Logo 74 Degrees

 

The Cognitive and Statistical Biases of Deflate Gate

I’ve been biting my tongue on Deflate Gate, ryt the scientist in me reached a tipping point today after reading this Boston Globe article analyzing the Patriots rebuttal to the Wells Report. Simply put: Many of the salient parts of this story aren’t being told properly.

This post will analyze key areas of the Wells Report based on the foundations of this blog (cognition in the context of sports statistics):

  1. The interpretation of context-free communication snippets
  2. The interpretation of memory-based claims
  3. The statistical analysis of the AFC Championship game measurements
  4. The lack of coherence in any proposed tampering scheme

Conclusions and a summary are presented at the bottom for those who want to skip over 4,000 words of details.

1. Communication and Context

a. Ambiguity

Essentially, human beings cannot communicate without context. Think about words like transactions. Deposits. Tellers. Now read the following snippet from a conversation:

Person A: “Pick me up by the bank of the river.”

In all likelihood, you thought of a financial institution. Without any context, that snippet is ambiguous. But the mind will rarely interpret it as ambiguous; talking about things related to a financial institution primes the brain to think of a financial institution and thus interpret “bank” as a building with windows and tellers. Most importantly, one’s instinct is to think it’s something, and not think it’s ambiguous. But what happens when we get to see the full conversation?

Person A: “Are you we going rowing tomorrow?”

Person B: “Yes, I’m setting up the boat right now.”

Person A: “Great, so when should we meet?”

Person B: “Let’s say 9. I’ll start at the boathouse.”

Person A: “Pick me up by the bank of the river.”

Person B: “Great — you can just hop on there.”

The conversion is no longer really ambiguous — these two people are talking about an embankment next to the river, not a financial institution.

The majority of the non-statistical case in the Wells Report is based on highly ambiguous texts that were taken out of context. Surely, most people who have been following the story from the beginning don’t think they are ambiguous because most people were thinking about the context of deflating footballs when they read the text messages. They were primed to think this way and as such can no longer see another explanation as even reasonable without oodles of context.

This is a typical case of anchoring and confirmation bias, two of the most powerful mechanisms that govern our decision making. I can shape your opinions by putting information in your head (like the financial institution example above), whether accurate or wrong, and that initial information holds extra weight in your mind (anchoring). Then, once you start to believe something, you start to only look for evidence that supports your story (confirmation bias).

b. “Help the Deflator”

The Globe article takes exception to the Patriots explanation that Jim McNally referenced himself as the ‘deflator’ because he is a big fellow and wanted to lose weight. The author, Ben Volin, responds “It’s hard to find a rational-thinking person in the country who buys this answer.”

That’s just ignorant. And understandably — the science of thinking isn’t exactly taught in high schools.

Volin is under the impression that his mind isn’t heavily anchored to the context of deflating footballs, when for months, he’s only associated the term deflator with this issue. He, like most of us now, probably can’t even think of the word “deflate” without thinking of PSI and footballs. From a cognitive standpoint, that’s predictable.

But not necessarily accurate.

Prima facie the texts reflect incriminating language to those who have been loaded up with the idea that there was a tampering ploy in place. Once the mind has decided what the ‘deflator’ refers to, it has a hard time accepting a counter explanation without a larger sum of evidence. But again, that’s a recipe for false conclusions and simply a predictable function of the brain’s desire to create certainty instead of ambiguity.

Conversely, if I told you that two jocular workmates came up with strange terms to needle each other with, and those terms were related to the actual work they did every day, would you think that’s strange? It’s possible, without any additional evidence (see section 3 and 4 below), that he calls himself the “deflator” because he regularly tampers with footballs on Sunday. There are also a myriad of other possibilities for that one text message, especially given McNally’s texting habits and propensity for wild language and nonsensical statements (e.g. “what’s up dorito dink?”)

People use jargon specific to their vocation all the time, and do so in extending humor or personalizing phrases. On the outside (with no context) these references seem meaningless or are misinterpreted. That’s the definition of an “inside joke.” It takes one instant of connecting a football losing weight to a person losing weight and voila, an inside joke. (Or, the only recorded instance of McNally referring to his role in a tampering scheme, a scheme that was otherwise never apparently discussed over text.)

Just imagine what kind of story can be painted when snippets are taken out of context. The possibilities are endless:

Honestly, I just wanted to slip in Brian Williams rapping. Everyone’s so serious about these footballs that they could use a little Gin and Juice.

c. What happens when you add context?

There are a number of instances of people misinterpreting something without context. I was going to cue up a bunch of examples and research experiments, but we need to look no further than the Patriots inclusion of (alleged) other testimony that was omitted from the Wells Report.

Consider that the report tries to make it look like Brady is bribing McNally with gifts. What the report does not include, according to the Patriots, is that Brady regularly gives comparable gifts to “15 non-player personnel.” This, and many other omissions like this in the report, need little analysis to illustrate the problems with context-free conclusions; obviously it looks quite different if Brady only gives alleged co-conspirators gifts, versus a standard distribution of comparable gifts to a number of people regularly.

Similarly, there was a bit in the Wells Report about “getting a needle.” Without context, many interpreted this as fitting a narrative, as the brain is designed to do. But, when you add the context that McNally was the individual who literally provided needles to the officials for their pre game process, it completely changes the meaning of the text. Again, I’ll leave it up to the reader to guess why Wells left out the context, that according to the Patriots rebuttal, was confirmed by all witnesses involved. (Specifically, that McNally had to return to ask Jastremski for another needle at the request of the officials, and that this practice became, like seemingly everything else between them, a running joke.)

The only difference between these examples and the “deflator” joke is that the full context about the weight-loss joke hasn’t been substantiated by others. However, this is the same cognitive mechanism seen in God of the Gaps thinking. It is a fallacy, and the very problem with context-free thinking, to think that just because we don’t currently have that context that an alternative explanation is improbable or, as the Globe believes, impossible. Especially in the face of all of the other insinuations in the Wells Report being corrected.

2. Memory

There are then a number of other major claims in the Wells Report regarding memory.

a. “I don’t know Jim McNally”

Let’s say my doorman is named Bill. I know his face. I know his voice. I know some of his habits. If I were asked about “William McCluskey” I would have to honestly say “I have no idea who that is.” Both of these things are true — I know a doorman named Bill, and I also don’t know someone named “William McCluskey.”

Only it turns out that these are the same people. I would not be lying to say “I don’t know William McCluskey.” Just like Brady wasn’t lying when he said he didn’t know some lowly locker room attendant’s full name. He did think his name was “Burt,” phonetically similar to his nickname, “Bird,” so Brady was indeed aware of the existence of McNally and even had a name to pair to his face. (I’ll leave it up to the reader to decide if the Wells investigators lacked the knowledge to realize this or intentionally framed it to look like a lie.)

b. “I didn’t do anything abnormal”

Next, there is the trip to the bathroom that McNally took. He was originally asked if he did anything out of the ordinary when transporting the balls from the locker room to the field, to which he answered, “no.” He then later said he went to the bathroom, which the investigators presumed to be out of the ordinary, and thus interpreted McNally as being untruthful. This is a terribly false conclusion.

Assuming he does occasionally go to the bathroom, as he claimed, his answer of “nothing was abnormal” and then later “I went to the bathroom”  is indeed consistent — as opposed to a contradiction — based on his own recollections. Because, for McNally, going to the bathroom wasn’t out of the ordinary. Simply because the investigators find it out of the ordinary doesn’t mean it was for McNally.

c. “I’ve never seen THAT before”

The final issue I’ll discuss regarding memory is referee Walt Anderson’s recollection of the ball location. I once worked across the street from a pink house for a month straight. One day the topic of houses came up. I looked out the window and said “look, they just painted that house pink!” I was informed by many others that the house had always been pink, I’d simply never cared or devoted any attentional resources to it.

It is possible that in his entire career, the AFC Championship game was indeed the only time Walt Anderson remembers the balls “going missing.” (Although it does beg the question of why no one else found it strange that a giant man carried a giant bag of footballs out of the room in plain sight.) But, given that he had been primed before the game to pay attention to the balls for the first time in his career, it is expected that he would suddenly notice things he’s never noticed before. It’s possible that every time Anderson was in New England, or anywhere, that the procedure was equally as lackadaisical as it was during that game…he just never gave any attentional resources to it. Kind of like how that house was always pink.

3. Statistical Analysis and Physics

It is quite clear that no one at the NFL knew anything about the Ideal Gas Law when this story broke. Heck, it’s quite clear most of the public didn’t know about it either. This creates another major bias that is very hard to undo.

Due to the Ideal Gas Law, ever since footballs have been inspected pre game they’ve also been in play at different PSIs. All the time. If we could go back in time and measure balls at half-time of every game, some would be in the 10s, some in the 11s, some in the 12s, etc. People only found the Patriots report abnormal because they had never been introduced to it before. It was a physics problem, and most people, without knowing physics, declared it was abnormal. When explanations of the Ideal Gas Law sprang up on the Internet, people had the same disbelieving reaction that they do now to “the deflator” explanation.

Again, the mind is ripe to do this. We’d like to be measured and cautious and say “I’m not an expert at this so I’ll find out more,” but that’s not how the brain is hardwired to work. By the time the Ideal Gas Law was popularized, people had already made up their mind there was tampering. And it’s very, very hard to undo that. This leads to the aforementioned confirmation bias.

Here’s the thing though — the Wells Report did not present strong evidence that concluded the balls were tampered with, or even likely tampered with. Amazingly, the report tries to bury this finding by making a bunch of assumptions in an attempt to say that it was possible there was tampering, when the “more probable than not” interpretation from any scientist would have to be the opposite conclusion: that it was more probable than not there was no tampering, and that the ambiguities around game day measurements leave open the possibility of foul play.

Analyzing the Wells Report Data

This issue has been discussed in great detail, but I want to translate some numbers to demonstrate why the Wells language is the opposite of what it should be based on the data. The Exponent team commits all sorts of scientific faux pas, such as presenting p-values based on a nonsensical dataset that is literally the best looking data they have to support tampering. (Amazingly, despite all their assumptions, this is the only area of the report they perform such statistical tests.)

So why is it so disingenuous to compare the Colts averages to the Patriots? Primarily because the Colts balls were measured after the Patriots, so they had ample time to recalibrate to the new indoor temperature, raising the air pressure with every minute that passed. Exponents own graphs (fig 22, page 203) show a ~1.0 PSI increase in pressure expected in the Indianapolis balls after 10 minutes indoors…but they make no attempt to adjust the data and retest. From a methodological standpoint, this is astounding. This wouldn’t pass an undergraduate peer review.

A 12.5 PSI football, with all other factors being equal (which they weren’t), is expected to be at 11.32 PSI given the game-day conditions in Foxboro (including an atmospheric pressure of 14.636.). Similarly, we’d expect a 13.0 PSI football to be 11.8 PSI when it entered the locker room at halftime. However, as the balls heat up in the locker room after coming off the field, they will rapidly increase in PSI as shown below. All of the Patriots balls were apparently measured first — with perfect instruments, and excluding the effect of water, we’d expect those balls to be about 11.5 PSI after 2 minutes indoors, when measuring could have started, and [edit] based on time estimations, the highest ball would be about 12.2 PSI.

Meanwhile, starting the Colts measurements at the 10 to 11 minute mark, we’d expect their balls (13 PSI pre game) to measure in the 12.8 to 12.9 range. In other words, in 10 minutes indoors, the Colts balls would have almost completely returned to pre game levels, while a large chunk of the Patriots balls would be significantly closer to outside-condition measurements.

Now, that 11.32 number does not include water, which changes the volume of the ball and has an additional effect on the pressure. Exponent seems to contradict themselves here, stating first that they couldn’t observe any volume change in a wet ball (thus making water moot). However, in their “spraying” test, where they simulated outdoor conditions with wet balls, they clearly find a difference between wet and dry balls. Again, significant because the Patriots balls were wet and the Colts balls were protected in a bag and unused at the end of the first half. Below are the expected readings based on time indoors and dry/wet conditions:

Figure 22 of the Wells Report

Figure 22 of the Wells Report

Now here’s the actual data:

Screen Shot 2015-05-16 at 4.38.08 PM

Note that we wouldn’t really ever expect to see a reading below 11.2, according to Exponent, even with water involved. So, unless something else needs to be incorporated, something beyond the factors we’ve examined would have additionally deflated the footballs per Clete Blakeman’s readings. (Assuming that it’s just simply not transcription error or gauge inaccuracy.)

Amazingly, it turns out something else does need to be incorporated: Walt Anderson possessed a gauge that read roughly 0.3-0.45 PSI below the other gauge. Exponent believes his gauges were consistent — something in science we call “reliability” — despite not being accurate. The Patriots expressed concerns about reliability when they pointed out that the intercepted football measured 11.45, 11.35 and then 11.75 using the same gauge on the sideline, although Exponent did indeed test the gauges in question and found them to be fairly reliable. We’ll assume they are reliable (consistent) for the rest of the post, although this is clearly an area that could create additional variance in the readings.

EDIT: See the follow-up post for visualizations of both the Logo and Non-Logo measurements as described on page 70 of the Wells Report.

The Logo Gauge (Higher readings) Scenario

Anderson claims to have used the higher gauge to take the pre game measurements. Why does this matter?

  • If we examine just the presumed logo gauge measurements between pre game and halftime, only the 4th and 10th Patriot ball fall just below our expected floor (by 0.2 and 0.3 PSI, respectively).
  • If we examine just the non-logo gauge measurements and assume Anderson used the logo gauge pre game, then we’d expect something like a floor of 10.9 PSI without water involved, and thus probably nothing below 10.7 PSI. There, the 4th ball is right on the cutoff and the 10th ball 0.2 PSI below.

In other words, if you believe Walt Anderson, then almost all of the Patriots balls were found to be in a range that demonstrates non-tampering. Not the opposite.

From what I’ve seen about variability in measurements, I’m not comfortable chalking up 0.2-0.3 PSI on two balls to “tampering” factors outside of gauge reliability, transcription error, or some other subtle natural effect (i,e. additional water) that we aren’t accounting for. [EDIT: As you can see in the time projections in the next post, variance is clearly an issue here with both teams balls.] Heck, the sample size of this experiment is really 1, because we’d need to test balls at halftime for a handful of games to see if there are readings that also fall just outside the range predicted by Ideal Gas Law or if that is indeed abnormal, even by a small amount. (That’s where you’d publish a p-value, FYI.)

For instance, the Colts 3rd ball measures 12.95. Exponent believes this is a transcription error because it would be the only instance of the Non Logo Gauge measuring higher than the Logo Gauge. Simply introducing this kind of measurement variability essentially puts every Patriot football measured within the expected norm.

In other words, if the Logo Gauge was used pre game, it’s most likely the Patriots balls were not adjusted or tampered with.

The Non-Logo Gauge Scenario

Now, there’s another major issue that Exponent also skirts over. If the Non-Logo Gauge were used in the pre game, then how does one explain the Colts readings on that gauge? Indy’s four balls assumed to be measured at halftime by the Non-Logo Gauge exhibited PSIs at 12.7, 12.75, 12.5 and 12.55. However, look at the dry temperature curve presented above. Those balls should all clearly be above 12.8 PSI.

If the Non-Logo Gauge were used pre-game, then something doesn’t add up with the Colts balls.

In other words, using Exponent’s (arbitrary) decision to discount Anderson’s claim that he used the Logo pre-game and assume he actually used the Non-Logo gauge, then the Colts balls have the exact same problems (to almost the exact same degree) that the Patriots balls exhibited. [EDIT: This can be clearly seen in the follow-up post as the Colts balls “shift down” in the Non-Logo scenario along with the Patriots balls.]

In conclusion, assuming we believe the (unrecorded) pre game measurements of ~12.5 for Patriot balls and 13.0 for Indianapolis balls, the data shows that

  • if Anderson used the Non-Logo Gauge in the pre game, the Colts balls were also slightly below where they should be based on the physics
  • if Anderson used the Logo Gauge, two Patriot balls were slightly below where they should be based on the physics, to almost the exact same degree that the Colts balls would be in the Non-Logo scenario

Contrary to Exponent’s conclusions, this procedure is not reasonably scientific because it’s predicated on Anderson’s recollection that the Patriots balls were 12.5 and Colts balls were 13.0. (Memory is unreliable.) There were also no consistent (or repeated) measurements standards and no controls, such as measuring the balls at the same time. With that said, since we can safely assume the Colts didn’t tamper with the balls, that the Colts balls exhibit the same minor failure to align with the Ideal Gas Law and other physics factors that the Patriots balls do, if Anderson used the Logo Gauge then it’s highly likely that there was no tampering.

What if Anderson used the Non-Logo gauge, as Exponent suggests? Then only three of the Patriots 11 balls are where we’d expect to see them, and the 10.5 measurement from ball No. 10 seems particularly problematic. If that ball were indeed properly measured at 12.5 pre game by the Non-Logo gauge, and the gauge is consistent in its measurements, then that ball is approximately 1 PSI below where it should be. Even if we allot for the same 0.2-0.3 PSI outliers we see from both the Patriots balls and the Colts balls, this does suggest factors unexplained by temperature, air pressure and precipitation. [EDIT: If we we account for time, the Colts balls also exhibit the same problems in the Non-Logo scenario, as demonstrated by the time projects in the next post.]

4. Is there a plausible explanation for Wells’ claim?

Despite reading a good deal of reaction to this story, I have yet to encounter a coherent explanation for what is being alleged. The Wells Report is quite careful not to author such a story, using vague language instead. But let’s actually spell out what they are alleging:

  • Walt Anderson was wrong about what gauge he used in the pre-game, thus the Patriots have some under inflated footballs because of tampering
  • Eight of their footballs seem under-inflated by (approximately) 0.5-0.6 PSI on average, with the lowest football being about 1.0 PSI below expected
  • Those footballs were deflated by Jim McNally in the 100 seconds he was in the bathroom with the balls
  • McNally has done this regularly since at least 2013, because he calls himself the “deflator”

In order to believe the above story, one also must believe the following:

  1. Brady would have to figure out at some point in time that he preferred balls just under 12.5 PSI, in the 11.5-12.0 PSI range.
  2. Brady determined that the difference between 11.5-12.0 PSI and 12.5 PSI was so great that he felt he needed to ask an employee to tamper with the footballs, and not even risk under inflating them and hoping they passed inspection.
  3. However, Brady could only tamper with the footballs at home. He’d be using footballs that were so different in his mind that they were worth tampering with at home…but on the road, he would be out of luck. Despite this, he’s better on the road than most NFL QBs.
    • Note that this makes the Indianapolis Colts claim that the Patriots used deflated footballs in Indianapolis during the regular season essentially impossible.
  4. During the October, 2014 game against the Jets, “the deflator” Jim McNally failed to deflate at least some footballs (that were 16 PSI)
  5. Tom Brady, after blowing up on the sideline about the quality of the footballs during that game, performed the following as a charade to protect the cover-up:
    • Brady (allegedly) in front of others, declared he wanted balls at the low permissible range (~12.5 PSI) before giving them to the referee. (Even though McNally was already deflating balls…so why would they not already be at the low range to save McNally time in his deflation process?)
    • Brady brought a rule book to the officials to show them 12.5 PSI balls should not be touched…even though he knew McNally was going to alter them.
  6. Furthermore, Brady would have to go through the charade of inspecting the balls pre-game in front of other people, knowing that these would not be the balls he would playing with. His true pre-game ritual was one of the following:
    • He secretly inspected footballs at some 11.5-12.0 range and then told the staff to inflate them to 12.5 so he could stage a second, phony inspection in front of others every game (right before the balls are delivered to the officials), while no one noticed him missing or sneaking away during this period, OR
    • He simply inspected them at 12.5 for tack and feel, knowing that once he let out a little air, the PSI would be where he wanted it. This explanation assumes that he is both so meticulous about PSI that we wanted less than a pound of PSI (which no human can seemingly detect) out of the ball and simultaneously does not think there would be a tactile difference between a 12.5 and 11.5 PSI ball that he needs to actually inspect the real ball-condition he will play with.
  7. Jastremski and Brady are either horrible at tampering — setting balls to 12.75-12.85 instead of the lowest permissible 12.5 before the Jets game (which would make McNally’s work harder), or they did indeed start at 12.5 pre-October, 2014 and decided to make up a story to tell the investigators that they used to inflate to 12.75-12.85 so Brady could plausibly deny ever knowing about PSI before October, 2014.

I have yet to see Wells, or anyone, make sense of this convoluted, contradictory set of events that must have had to happen according to their meaning of the “deflator” text and allegations about regularly deflating footballs. Which of the following conclusions seems more likely to you?

Conclusion A: Tom Brady figured out that he really liked footballs just under the legal limit, decided not to have his equipment team slightly under inflate balls to hope they would pass a lackadaisical NFL inspection process (the technique Aaron Rodgers told Phil Simms about), but instead set up an elaborate process to take just a little air out of the balls, but only at home. And Walt Anderson forgot what gauged he used.

Conclusion B: Walt Anderson correctly remembered what gauge he used, footballs can have incredibly small perturbations outside what we’d expect based on just temperature and pressure and Jim McNally was indeed referring to “deflating” his waist.

Can anyone reconcile these issues? Because Occam’s Razor says Conclusion B is a significantly more — excuse me — Conclusion B is “more probable than not.” Yet despite this, recent polls suggest that a majority of the country believe the Patriots cheated…without actually being able to offer up a coherent story for how the events in the Wells Report make sense. This is not the same thing as knowing very little and failing to create a plausible story, because in this case, there is evidence in the report that needs to be reconciled with the claim because it seemingly contradicts the claim or relies on less-than-likely dependencies.

Conclusion

  1. A Lack of context and predictable cognitive biases can make text messages appear as they aren’t, and make alternative explanations less believable than they really are
  2. A lack of understanding around memory likely led the Wells investigators to a number of false conclusions
  3. The AFC Championship game data show the following:
    • If Walt Anderson used the Logo Gauge, it essentially proves the Patriots did not tamper and never have (since McNally is caught on tape going into the bathroom but the balls don’t appear tampered with.)
    • If Walt Anderson used the Non-Logo Gauge, it opens up the possibility that the Patriots tampered with the footballs, based on the story below.
  4. People are comfortable claiming tampering despite the story from the Wells Report lacking coherence and requiring the following to be true: :
    • Brady would have discovered he feels a large enough advantage in slightly deflating, when no one else seems to be able to tell the difference
    • He would have been OK with using different (non-deflated) footballs, on the road, despite leading the 2006 rule change to create uniform preparation for QBs at home and on the road
    • Despite tampering only being carried out at home, Brady’s performance has been better on the road.
    • Despite his better road performance, Brady still went through with tampering by carrying out a phony inspection in the locker room before every game (because those would not be the conditions of the balls he would use post-McNally deflation.)
    • After the October, 2014 game against the Jets, Brady extended the charade by providing a copy of the rulebook to the officials before games, knowing full well that McNally would deflate below 12.5 anyway.

PS Please don’t use this post to disparage others. It’s designed to educate, regardless of your opinion or rooting allegiances.

PPS Here’s the follow-up post visualizing the Logo Gauge and Non-Logo Gauge measurements for both teams

Is Tom Brady Better at Home?

All of this talk about deflating footballs begs a natural question: What are the differences in performance in Tom Brady’s performance at home (where the Patriots are alleged to have tampered with the footballs) and on the road (where, based on the existing accusations, they would not have tampered with the balls)?

The following table is a list of all the quarterbacks in the last two seasons who have attempted at least 200 passes at home and on the road. The numbers show are the difference between home and road performance. In other words, a positive number means a higher number at home. Below are the results:

Screen Shot 2015-05-17 at 2.58.10 PM

In the last 2 years, Aaron Rodgers has shown the greatest improvement at home relative to away games. Rodgers ranks first in interception percentage drop, first in increase in touchdown percentage, second in increase in yards per attempt (Brandon Weeden is first) and first by a landslide in QB Rating. Tom Brady is 10th in improvement at home in QB rating, leagues behind Rodgers amazing 37.9 QB Rating jump in home games.

What if we expand the sample to go back to the 2006 rule change where quarterbacks from each team could control the ball? How does Brady look then? (Min 500 attempts home and away to qualify for this sample.)

Screen Shot 2015-05-17 at 3.36.47 PM

Brady has actually been quite poor over the long haul at home, at least from a basic statistical perspective. Brady is well below average in these areas, showing a pretty significant road bias (not home) in performance. The average for these 45 qualifying quarterbacks was an improvement of 3.9 QB Rating points at home, 0.2 more yards per attempt at home, a 0.4% bump in TD% and 0.2% drop in INT%. Brady does throw fewer interceptions at home per pass, but his other numbers actually trend in the opposite direction and are better on the road. His QB Rating is a shade better at home, but again, that indicates abnormally strong play on the road relative to the rest of the quarterbacks in the league.

Fumbling Statistics and Patriot Trends

There has been a lot of discussion and (misinformation) floating around on fumbling. Because fumbling is not exactly a sexy topic, it’s not something that gets a lot of love, perhaps outside of Bill Barnwell’s research on the randomness of fumble recoveries. The purpose of this post is to clarify general fumbling trends, behaviors and how the New England Patriots fit into this puzzle.

For this research, I’m using PFR’s database. I’ll try and highlight wherever possible any gaps in the data.

I. 2007-2014 Totals

Since 2007, the Patriots have the lowest fumbling rate in the NFL. There are a lot of reasons for this — as in, the individual areas in which they excel — and many possible explanations for those reasons (e.g. “they are better coached”). We’ll explore some of these in a moment. First, here are the league-wide fumbling rates from 2007-2014, using both special teams and offensive plays:

Image

The Patriots fumble the least frequently over this time period. Their claim to the top spot is a difference of 3 total fumbles in 7 years over the second-place Atlanta Falcons. Some other observations:

Dome Teams vs. Outdoor Teams (based on home stadium)
Dome team fumble% = 1.83
Outdoor teams fumble% = 1.94%

The sample size is small (9 dome teams), but there is very little, at least on the surface, to suggest playing in a dome reduces fumbling rates. At least not by anything we can detect with these sample sizes and the data parsed this way. There should be no one viewing teams that play outdoor home games as a different set of fumblers than teams playing indoor games at home.

So, how does fumbling break down under a more granular microscope? Do some teams fumble more or less in the special teams than in the run or pass game? (Teams use different, shared “K-Balls” more geared toward kicking on special teams plays). How do QB’s fit in? Pass protection? The following sections take a deeper look.

II. Special Teams vs. Offensive Plays

The league average fumbling rate on special teams and on rushing/passing plays is starkly different. If general common sense weren’t enough evidence, this is overwhelming support that fumbling is not random. This makes sense, as impact/effort/vulnerability and ball security all contribute to a fumble. The plays on special teams are often more violent and more hectic. Here’s the difference:

2007-2014 NFL Averages
Special Team plays fumble%: 2.81%
Offensive plays fumble%: 1.77%

And a breakdown of Special Teams fumbling rates by team since 2007:

Image

III. The Patriots on Offensive Plays Alone

The Patriots have a larger edge over the field when looking at just offensive plays since 2007. This is the impetus for the statistical “analysis” angle that they have been playing with advantageous balls. Allow me to make one comment outside of statistics for a second: High School and collegiate athletes take steroids. Not all make the NFL. The bad can cheat to be average and the great can cheat to be legendary. Thus, nothing can prove a team didn’t cheat. However, we can “prove” that something is either a reasonable outlier or not really an outlier at all, which is where this next figure fits in.

Image

Here New England is well ahead of the league. Second-place Atlanta would have needed 18 additional fumbles in 7 years (is that a lot?) to pull even with the Patriots. Speaking of Atlanta, while New England’s z-score of 2.70 — assuming a normal distribution of fumbling exists — is impressive, but it’s not some anomaly. Atlanta boasts a better z-score over the same time period on special teams, as shown in section II. Statistically, this means that it’s very likely something is causing Atlanta and New England to excel in these areas other than just randomness. Until 2 weeks ago, that something was universally explained by

  • coaching
  • ball security of individual players
  • scheme
  • opponent
  • game situation

and so on…in other words, fumbling isn’t random because there are areas where teams and individuals can excel that allow them to fumble less frequently than others.

IV. Quarterbacks fumble the most

Speaking of quarterbacks, they fumble the most. Since 2007, PFR has 4,822 fumbles on record coming from skilled position players on offense — the QB’s, RB’s, WR’s and TE’s. Of those fumbles, nearly half belong to quarterbacks, a not-so-shocking revelation when we consider all the additional variables in play for a QB fumbling other than simp lying “running with football.” Quarterbacks have a center-exchange of the ball, a handoff exchange, and most importantly, are smashed by much larger men, often when they are trying to do something other than “run” and might have no idea that they are going to be smashed. (“The strip sack.”)

Image

So the QB fumbling performance of a team will have a sizable impact on the overall offense’s fumbling rate. This means that if a team is equipped with a QB with excellent ball-handling, who makes quick decisions and avoids pressure in the pocket well, he (and the scheme and offensive line, technically), can single-handedly inflate or deflate a team’s fumbling rate. Here are the top fumbling teams by QB since 2007 (Plays include passing attempts, rushing attempts and sacks):

Image

The Quarterbacks of those top teams are some of the least-frequent fumblers in the league. Here are the top QB’s, minimum 500 passing attempts, since 2007. Note, fumble% for QB’s includes their passing attempts, so the number of “plays” involved are passing attempts, sacks and rushes. (min 500 attempts)

Image

While Tom Brady excels here, he isn’t even on the top line. Peyton Manning, with his incredibly low 0.70% fumble rate clocks in at the top (along with former Brady backup Brian Hoyer), which makes for an interesting case study, because Manning switched teams a few years ago. Here are how Manning’s teams fared over the years with and without him.

Image

The 2005 and 2006 numbers are best in the league, posting z-scores of 2.10 and 2.14. Manning is barely clipped by Brady’s Patriots in 2007 (1.18% to 1.13%) only to regain the top spot in 2008 at 0.83%. Only six teams from 2003-2010 posted fumble rates under 1%, and Manning’s Colts did it four times, in addition to a 1.01% season in 2010. Manning left and the 2011 Colts offensive fumble rate plummeted to second-worst in the league. Meanwhile, he joined a Broncos team with the worst 2011 fumble rate (2.56%) on offensive plays and in 2012, helped improve them to 1.28%, good for 10th in the league. The Colts 5-year run from 05-09 was the best of any team since 2003 at 0.98% fumble% over 5 years, narrowly edging the 2010-2014 Patriots at 0.99%.

V. The League Has Improved Ball Security

Everyone has improved their fumble% over the last decade. From 2008-2013 the league-wide fumble% declined every season, before blipping back up in 2014.

Image

Notice that the best fumble season on offense in this period was the incredible 2011 New Orleans Saints season, just 0.54%. The Saints had another banner year in 2013 with a 0.74% rate. While 2003-2010 saw only 6 sub-1% fumble rate seasons from offenses, 2011-2014 produced 12. Three of those 12 were from the Patriots, who added a 0.74% of their own in 2011.

VI. How did the Patriots do It? A Law Firm?

The Patriots not only have excelled with Brady protecting the ball, but in their running game as well. Anecdotally, here is where Bill Belichick has always prioritized ball control, quick to bench talented runners who don’t take care of the ball. The Patriots finished with the best fumble rate at RB from 2007-2014, as shown below:

Image

Benjarvus-Green Ellis never fumbled in 536 offensive touches in New England. If those 511 touches were replaced by running back who fumbled at a league average rate (1.07% for RB’s), the Patriots would have added about six fumbles to their total over this time period. While that may sound like a small amount, New England running backs fumbled a total of 26 times from 2007-2014; five additional fumbles is a 23% increase in total fumbles, and takes the Patriots from 1st in this category down to 11th. (0.92%) In other words, one could say the difference between Green-Ellis and an average ball-handling back is the difference between the Patriots finishing 11th and 1st in this category.

Patriots wide-receivers have not been as stellar over the same period.

Image

They are 12th, slightly above the league average of 2.05%, but don’t stand out here in the way that the RB’s or QB does.

Of course, the other key to the Patriots excelling in these areas is their consistency. They don’t have a number of outlying individual seasons, but instead are very good every year. Focusing on where they really excel — fumbles per offensive play — you can see based on their yearly ranking how their consistency helps them gain a lead over the league in this selected time period (a time period where they have gone 100-28 as a team, grabbed 6 first-round byes and are playing in their 3rd Super Bowl.

Patriots Offensive Rank, fumble% by year
2007: 1st
2008: 4th
2009: 8th
2010: 1st
2011: 2nd
2012: 5th
2013: 19th
2014: 2nd

2013 saw the departure of a number of skilled position player, and included Stevan Ridley coughing up 4 fumbles in 188 touches, for a fumbling rate nearly double the league average at RB of 2.12%. LeGarrete Blount joined the team and fumbled 3 times in 158 touches (1.90%). That’s all it takes in the NFL to have a bad fumbling season.

VII. Fumbling Rates correlate and good offenses

Finally, as you may have noticed, some really good offenses protect the ball. Negatively correlated to fumble% — all around roughly 0.4 correlation coefficients — are the following offensive categories:

Yards (0.37)
1st-Down% (0.39)
TD’s (0.41)

In other words, better offensive teams tend to fumble the ball less. This isn’t universally true — thus the moderate coefficient of 0.4 — as teams with scrambling quarterbacks or teams with relatively bad pass protection might completely buck the trend, just as conservative offensives can do so in the opposite direction. But in general, good offenses, and especially QB’s that are good with the ball, will lead to less fumbling and increased offensive efficacy.