Solved – Examples of costly consequences from improper use of statistical tools

datasetmethodologyteaching

I suspect that most users of statistical tools are ancillary users (folks who have had little to no formal training in statistics). It’s very tempting for researchers and other professionals to apply statistical methods to their data simply because they have seen it “done before” in peer-reviewed papers, grey literature, the web or at a conference. However, doing so without a clear understanding of the required assumptions and the statistical tool’s limitations can lead to erroneous results—errors often unacknowledged!

I find that undergraduate students (particularly in the social and natural sciences) are either unaware of the statistical pitfalls or find these pitfalls inconsequential (the latter being most often the case). Though examples of improper use of statistical tools can be found in many introductory level text books, the web or StackExchange, I have a difficult time finding real-world examples that have had detrimental results (e.g. cost in $, lives impacted and careers lost). To that end, I am looking for real-world examples that highlight the misuse of statistical methods for which:

  1. the statistical methods used are typically covered in introductory stats courses (i.e. inferential stats, regressions, etc…)
  2. the end result have had costly consequences (dollars lost, lives impacted, careers shattered etc…)
  3. the data are readily available for use as working examples in a course (the purpose is to have students work through real world examples that have had real world consequences.)

One non-statistical example I like to bring up to students when discussing the importance of properly defining the units in a research project is the “metric mishap” that led to the loss of a $125M satellite! This usually invokes an 😮 factor from the students and seems to have a lasting impression (at least throughout their short academic lives).

Best Answer

I'm not sure about data availability, but a great (if that's the right word) example of poor statistics is the Harvard Nurses' Study on the effectiveness of hormone replacement therapy (HRT) in menopausal women.

What's the general idea? The Nurses' Study suggested that HRT was beneficial for post-menopausal women. Turns out that this result arose because the control group was very different from the treatment group and these differences were not account for in the analysis. In subsequent randomized trials, HRT has been linked to cancer, heart attack, stroke, and blood clots. With appropriate corrections, the Nurses' study reveals these patterns as well.

I can't find estimates for US deaths related to HRT, but the magnitude was tens of thousands. One article links 1000 deaths in the UK to HRT.

This New York Times Magazine article provides good statistical background of the issues of confounding present in the study.

There's an academic discussion in this issue of the American Journal of Epidemiology. The articles compare the results of the observational Nurses' study to that of the Women's Health Initiative, based upon randomized trials.

There is also discussion (by many of the same individuals) in an issue of Biometrics See Freedman and Petitti's comment in particular [prepub version].

Related Question