Deflategate: Exponent’s Bias and the Master Error

With all of the publicized corrections to the science section of the Wells Report, I’ve been asked by more than one person whether Exponent, the author of said section, was simply incompetent, or whether they were biased. It’s a question that might have legal ramifications in the near future for Tom Brady.

As I’ll detail below,  there is a body of evidence suggesting that Exponent’s report was not merely the result of bad science, but conducted with a clear anti-Patriot bias. They repeatedly made errors or only looked at possibilities that weakened the Patriot’s position without ever making errors in their favor. The nature and frequency of these errors makes it unlikely to be a coincidence. Furthermore, Exponent committed a major error in one of their key figures, an error that allowed them to report, incorrectly, an anti-Patriot conclusion back to Ted Wells. What exactly am I referring to?

Not accounting for time of halftime measurements

At a high level, the biggest methodological error Exponent commits is not properly accounting for the time differences of when the balls were measured at halftime. This leads to a nonsensical statistical test that they publish to establish “statistical significance.” The problem is, they knew about this factor. They too considered it a salient factor. They made multiple transient curves mapping how things change depending on when they were measured at halftime.

They didn’t stop there.

They dedicated an entire section (Table 13, page 58) to perform a mini-version of the analysis I present here, using periods of “average measurement time” to compare the difference between expected PSI and observed PSI at a given time.

Wells writes, on page 122:

“According to Exponent, the environmental conditions with the most significant impact on the halftime measurements were the temperature in the Officials Locker Room when the game balls were tested prior to the game and at halftime, the temperature on the field during the first half of the game, the amount of time elapsed between when the game balls were brought back to the Officials Locker Room at halftime and when they were tested, and whether the game balls were wet or dry when they were tested. “

So they thought a lot about the impact of the timing of halftime measurements. On page 57, in one of many mentions of this:

“A similar effect is seen in the game day simulation data; the average pressure rises as the average measurement time is increased.”

Again on page 62:

“Based on the transient curves explained above, one would expect that if the Patriots footballs were set to a consistent or relatively consistent starting pressure, the pressure would rise relatively consistently as they were tested later in the Locker Room Period.”

Yet they still published their p-values on page 11 and conducted analyses in the opening pages without considering time! This cannot be due to incompetence since they are keenly aware of and explicitly call out the importance of time on multiple occasions. On page 64, in their concluding statement, their second point cites these statistical tests as critical pieces of evidence supporting their conclusion. Unless different people prepared different parts of the report, this is evidence of a clear bias against the Patriots. But it’s also just the beginning.

Switching Fig. 26 to the extreme low temperature of 67 degrees

The transient curve used in Figure 24 to project Non-Logo gauge results uses a pre-game room temperature of 71 degrees. The HVAC on the day of the game was set between 71 and 74 degrees. But Exponent measured the temperature in the room where the balls were gauged by officials in the pre-game to range from 67-71 degrees. It was a good 30 degrees colder outside on the day Exponent measured, and there wasn’t the same game day activity where numerous people give off extra heat in the room.

When they project the Logo gauge results on the transient curve used in Fig. 26, they switch the pre-game temperature to 67 degrees, the extreme end of the plausible spectrum that produces the lowest Patriot reading. Their explanation for using 67 degrees is so the Colt measurements align with the projections. This is a reasonable approach, given that the Colt balls “should” obey the laws of physics, but (a) it should not be the only scenario examined and (b) they did not need to drop the pre-game temp all the way to 67 degrees to achieve this! Doing so only increased the appearance of guilt for the Patriots. The Colt readings are still viable and withinin Exponent’s “range’ of what is predicted by physics even with a 69 degree pre-game temperature.

Misting the footballs to simulate rain

When accounting for water, as described on page 42 (footnote 36), footballs were sprayed every 15 minutes with a hand held spray bottle and then toweled off immediately. As has been demonstrated, this is a minimal attempt at simulating rain. This is critical to interpreting the results (that will be discussed below and that reflect those presented here); Exponent’s wet curves between Figure 24 and Figure 26 show an additional effect of about 0.25 PSI due to wetness simply from running the simulation again. Yet, as we’ll see in a second, they cannot imagine how the Patriot footballs would be a few tenths below where they were expected based on temperature-only projections.

Not calculating the actual PSI differences from expected

The mini experiment Exponent runs in Table 13 produces the following results: at the earliest plausible time (let’s use the 4:17 reading), Patriot averages should have been 11.54 PSI on the Non-Logo gauge. The actual Master-adjusted halftime average on the Non-Logo gauge was 11.09 PSI. So the Patriots are -0.45 PSI from expected. The Colts Non-Logo average was 12.29 according to Table 11 on pg 45. (This is because Exponent uses the “switch” option to correct for the anomalous 3rd Colt ball.) Therefore, the Patriot balls are about 0.4 PSI below the Colt balls relative to expected. Is that clear from Table 13?

Exponent Table 13

Not only is it unclear, Exponent never even publishes the differences. They fail to calculate or discuss perhaps the most specific and important detail of all of their experimentation, instead simply noting that the Colt readings are in-line with these simulations and the Patriot readings are not. This is not incompetence, it is a bias of omission. More importantly, are the Colt measurement times in Table 13 even plausible?

Assuming the Colt balls are measured before the Patriot balls

Exponent assumes, contrary to the evidence, that the Colt balls were gauged before the 11 Patriot balls were reinflated. This is yet another anti-Patriot “error” or instance where they refuse to examine other plausible scenarios. The repeated and consistent manner in which this happens is hard to chalk up to coincidental incompetence.

Wells does not explicitly state that the Colt balls were gauged before the Patriot balls were re-inflated. Exponent should have asked about this and should have clearly stated it if it were provided such information. If not, they should have, “to be fair,” at least considered the possibility that the Colt balls were gauged later in the locker room period as an explanation for the differences of a few tenths of air pressure.

Burying the Logo and Non-Logo average PSI results

So, what happens if they were to explicitly note the PSI differences in their table as well as including Colt measurements at 11 or 12 minutes, the times that they were likely to be gauged?

Table 13 Updated

An updated version of Exponent’s table 13, showing Non-Logo Gauge Master-Adjusted results with a 71-degree pre-game temperature. This table includes a later measurement time for the Colts as well as explicitly calling out the differences between the expected and observed halftime values.

Now, for example, it’s crystal clear that an approximate 4-and-a-half minute measure time for New England and 11-minute measure time for Indianapolis result in a difference of 0.3 PSI on the Non-Logo gauge between the Patriot and Colt balls. This is similar to what has been observed in more detailed analyses.

Forget the inclusion of a later Colt measurement though. Why doesn’t Exponent call out that differential since it’s perhaps the single most salient data point in their entire report? Without any corrections, it would reveal differences of a few tenths of PSI between the control (Colts) and Patriot Non-Logo readings. Would publishing that number have impacted people’s reactions to their conclusions?

What about the Logo gauge experiment in Table 14? The Patriot Master-adjusted Logo halftime average value was 11.21 PSI, hidden in the paragraph on the following page, meaning that their experiment again found Patriot balls 0.3-0.4 PSI below expected on the Logo gauge, with the pre-game temperature at 67 degrees.

Table 14 Updated

An updated version of Exponent’s table 14, showing Logo Gauge Master-Adjusted results with a 67-degree pre-game temperature. This table includes a later measurement time for the Colts as well as explicitly calling out the differences between the expected and observed halftime values.

Could water account for that small difference? Or a different temperature? Placing the pre-game temperature at something like 69 degrees will bring the Patriot balls about 0.1 PSI closer to expected. Again, this is something Exponent conveniently does not even consider, despite providing a plausible temperature range of 67-74 degrees and running misting tests that demonstrate an effect of wetness.

The Master Error — failing to use master projections for master results

And then there’s this enormous error.

In Figure 26 (a figure recycled again in Figure 30), Exponent used a Master-adjusted transient curve to demonstrate where the footballs are projected to be as they heat up at halftime. Only they fail to present an adjusted curve! Figure 26 is simply wrong.

The curve shows a dry starting halftime value of over 11.5 PSI for the expected Patriot values. But a Master-adjusted Patriot ball would actually be 12.17 PSI in the pre-game according to Exponent. A dry football is expected to be 11.20 PSI at 48 degrees if it were set at 12.17 PSI in a 67 degree environment in the pre-game, as Exponent is attempting to model. The graph is not master-adjusted, even though Exponent claims it is. It is a clear error and needs to be corrected.

What happens when it is corrected?
Screen Shot 2015-07-24 at 10.14.59 PM

The Logo scenario that Exponent presents to support its case suddenly contradicts it. It makes their primary conclusion on page 55 simply wrong:

“Based on the above conclusions, although the relative ‘explainability’ of the results from Game Day are dependent on which gauge was used by Walt Anderson prior to the game, given the most likely timing of events during halftime, the Patriots halftime measurements do not appear to be explained by the environmental factors tested, regardless of the gauge used.

Correcting this huge error would fundamentally alter this conclusion.

Incorrectly claiming that the pre-game temperature is set to help the Patriots

They continue to write, on page 54, that

“it is important again to note that values for the pre-game and halftime locker room temperatures shown in Figure 27 put the Patriots transient curves at their lowest possible positions.”

But this is completely backwards — yet another anti-Patriot error. In order to generate the lowest starting transient curve within the HVAC parameters, the pre-game temperature would be 74 degrees, producing a starting halftime value of 10.86 PSI. 67 degrees is actually the worst starting value for the Patriot differentials.

Inability to conceive of wetness as the explainable natural factor

The icing on the cake is that the differences in the Colt and Patriot measurements are in all likelihood the difference in their exposure to rain. For the uninitiated, this can be clearly seen in the gradient of differences among the Patriot balls that suggests some Patriot balls were exposed to more rain, and in particular those balls on the final drive of the half.

Yet on page 55, when discussing wetness as a factor, they write:

“According to Paul, Weiss, [a majority of wet balls] were most likely not present on Game Day.”

How can they say that, given the factors around wetness? They mention nothing of the Patriot balls being used more, and being in play at the end of the half. This is yet another ant-Patriot oversight. Remember, they presented back-to-back graphics in which water made on order of 0.2 PSI-0.4 PSI differences from the “dry” condition, based on their own misting procedure. Despite the game being played in rain, Exponent concludes that results of the exact same magnitude cannot be explained by rain.

Conclusion

All told, the only time they seem to do something that isn’t anti-Patriot is when they create a row in Tables 13 and 14 for average measurement times that are improbably early in the locker room period. Otherwise, every misstep, omission and blatant error is decidedly anti-Patriot, and often committed in inexplicable fashion. In summary, Exponent demonstrates the following biases by:

  • Failing to account for halftime measurements in publishing p-values, despite knowing time of measurement is critical
  • Switching to an (unnecessarily) extremely low temperature projection for the Patriot Logo gauge
  • Misting footballs to simulate rain (and immediately toweling them off)
  • Not publishing the actual PSI differences between halftime measurements and expected measurements
  • Assuming the Colt balls are measured improbably early in the locker room period, and not considering later measurement times
  • Presenting Figure 26 and 30 with completely false transient curves, thereby altering their conclusions vis-a-vis the Logo gauge
  • Incorrectly claiming the pre-game indoor temperature of 67 degrees is a best-case for the Patriots
  • Not considering wetness as an explanation for the few tenths difference despite finding a few tenths difference from wetness

12 thoughts on “Deflategate: Exponent’s Bias and the Master Error

  1. regarding the extreme low temperature of 67 degrees, p.48 of the exponent report states: “Measurements taken by Exponent on February 7th, 2015 in the shower area ranged between 67 degrees F and 71 degrees F. The shower area is neither actively heated nor cooled and is typically colder than the dressing area of the Officials Locker Room (which is constrained between 71°F and 74°F by the building HVAC). ”

    The report doesn’t indicate whether the 4 degree differential is based on measurements taken at different times of day, or taken in different places within the shower area. I assume it’s the latter. Outside, Feb 7 had a temperature high for the day of 25 degrees if i remember correctly, whereas when Anderson was gauging the balls in the shower area on Jan 19, it was in the low 50s outside. The shower area is presumably exchanging heat with both the 71-74 degree sitting room and with the outdoors.
    1) I think you are suggesting that if the temperature in the shower area drops 4 degrees when the differential between the HVAC room and the outdoors is about 45 degrees, it will drop less than 4 degrees when the differential with the outside is 20 degrees, and that the temperature in the shower area would not have dropped as low as 67 degrees at the time when Anderson was gauging the balls. While 67 is a theoretical lower bound based on the facts collected by exponent, it is an unlikely value…
    2) Since the gauging process itself was conducted after placing the 4 groups of balls in 4 separate spots in the shower area, it is not necessarily the case that the Patriots’ game balls and Colts’ game balls were gauged at identical temperatures. There’s not enough information about this as a source of variance. If there’s a temperature gradient in the shower area based on distance from the door to the HVAC controlled sitting room, or distance from an outside wall, this is another reason not to take the implied comparisons in Exponent figures 24-29 at face value. (Another variable might be the temperature of the surface with which the groups of balls were in contact; one group might have been piled in a metal sink, another on a concrete floor.) Even if the Colts’ balls were measured at a pregame temperature of 67 degrees, it doesn’t mean the Patriots’ balls couldn’t have been at a temperature of 68-69 when they were measured …
    So we have another potential cause of some of the variance, but not enough information to evaluate it. Wells did not get this level of detail about exactly how the balls were handled by Anderson.

    I think the Exponent report was drafted in sections by different individuals, which is why some sections fail to incorporate what was shown in other sections. It may be possible that the whole report is an exercise in cognitive bias. I don’t think you can separate bad science and bias that easily. I think that Wells somewhere along the way communicated the expectation that Exponent should find tampering, based on Wells’ interpretation of the other evidence. Without deliberate intention to distort, Exponent wouldn’t question the “right” results very much, but would look much more carefully at exculpatory results and would have seen if they went away with changed assumptions. I think that would have a lot to do with why only errors which “hurt” the patriots made it into the final report. I think it’s more about their inability to do good science in the face of receiving clues from Wells that they should find a particular result.

    • Some of these are in the Wells report. The footballs were on the floor arranged in different quadrants of the shower room, based on team and backup vs game balls.

      One source I saw had the temperature at 19 degrees on Feb 7. A 50 degree difference vs a 20 degree difference. I think not just 67 but 72 and 73 degrees are in play.

      There is an additional factor, that two of the Patriots footballs were inflated by the referee. This inflation itself would increase the internal temperature of the football by 1.5C, meaning that the pressure would drop by the time they took the field.

    • Hi Joe —

      Yes, I’m suggesting a 4 degree drop is unlikely given game-day conditions. With regards to bias, I’m not talking about cognitive bias. I’m talking about a non-neutral report, either deliberately or subconsciously slanted. The odds are too extreme for this many coincidental rulings going against the Patriots in the manner that they did to say that the report is not presented in an anti-Patriot manner. You’re making my point when you say they only presented a final report that “hurt” the patriots. That defies the notion that “science cannot explain additional pressure loss,” when it can, and that is the very basis of their presence in this investigation.

  2. I didn’t mean to imply that the report was not full of biased assumptions and choices against the Patriots; I only meant to speculate about whether it was honest bad science, or dishonest. If you’re intentionally cooking the books, you’ll only look for good results and bury any bad ones, and make assumptions which help you do so. If you don’t know what the answer is supposed to be, I imagine you’d see a random pattern of errors. If you do know what the answer is supposed to be, I think that will bias your results and assumptions too, even if you are not deliberately intending to cook the books toward finding that answer. You will be more critical of arguments and results which are leading to the “wrong” answer, and rethink them or redo them, but not so with the arguments or results which support the expected answer.

    My actual thought is that Exponent exposed too many flaws (text and graphs which didn’t match each other) for this to be a good job of deliberately cooking the books, and so for them, I’d lean away from overt malice, and toward more unconscious forms of bias.

    Granted, Exponent has a business model which depends in part on finding answers which clients want, and that makes honesty a rather negotiable idea. But what really baffles me is Princeton physicist Daniel Marlow’s approval of all this. He doesn’t have the same financial incentives as Exponent, and he should have been a lot more insulated from having his thinking steered by day to day interaction with the Wells team. High energy physics isn’t a closely relevant specialty for the questions at issue in Deflategate; I would have thought an academic with more direct expertise could have been recruited. However he is described as an experimentalist; since a lot of the flaws have to do with experimental rigor, it surprises me that he missed the flaws initially, and especially surprises me that he apparently unconditionally supported it all again at the appeal hearing. By the time of the appeal he should have read the final Exponent report and some of the leading criticisms of it. Unlike the more anonymous researchers at Exponent, Marlow’s reputation individually is at stake, and intellectual honesty is expected in his field. I find his contribution more puzzling than Exponent’s.

    • Agree completely, especially with your second paragraph.

      I’ll add that I was floored by Marlow’s testimony. These are things other people have independently observed and are not disputable. The “statistical analysis” section makes no mention of time. The section that does compare projected vs. actual at a given time does not explicitly call out how minor the differences are, etc. Pages A-5, A-6, A-7 and A-9 explicitly state that “The main effect of team is estimated by the average change in pressure between the start of the game and halftime.” A-2 lists one-time measurements as if they were all measured at the same time. I am baffled as to how this was re-stated by alleged scientists, and cannot believe Brady’s legal team cannot cite this in court or that they were unable to correct this at the hearing.

      • And following up on this, you can see what looks like an enormous amount of ego/defensiveness now that the June 23 appeal has been publicized. There are a number of instances where their scientists were questioned and attempted to misdirect questions or flat out not answer them when they should have been very simple. Such behavior runs contrary to most of the scientific community, which welcomes criticisms and suggestions.

        Specifically, I’m thinking of the bit about not including time in achieving a statistically significant result, win which they DID not include time, but when asked, started talking about how they included time somewhere else in their analysis.

  3. “Their explanation for using 67 degrees is so the Colt measurements align with the projections. This is a reasonable approach, given that the Colt balls “should” obey the laws of physics, … ”

    Am I mistaken or did they align the Colts measurements with the transient curve projections at the middle of halftime. The transient curves should top out near 13.0 where the colts balls are aligned if the colts balls were measured at the end of the half.

  4. “Failing to account for halftime measurements in publishing p-values, despite knowing time of measurement is critical”

    Did you mean to say “Failing to account for hafltime measurement times in …” ?

Leave a Reply

Your email address will not be published. Required fields are marked *

*