Solved – When is a deviation statistically significant

algorithmsstatistical significance

In a double-blind study, when are deviations from the control group considered statistically significant? And is this related to the number of samples?

I realise that every experiment is different and statistical significance should depend on the deviations in measurements and the size of the sample group, but I'm hoping there is an intuitive rule-of-thumb formula that can "flag" an event as interesting.

I have a light stats background in engineering, but am by no means a stats guru. A worked example would be much appreciated to help me understand and apply it to every-day things.


Update with example: OK, here's a (not so simple) thought experiment of what I mean. Suppose I want to measure the toxicity of additives in a village's water supply by comparing mortality rates over time. Given the village's population, natality and mortality rates over several years and the date upon which an additive was introduced into the water supply (disregard quantity), when would a rise in the mortality rate become interesting?

Intuitively, if the mortality rate remains between 0.95% and 1.25% for 10 years, and suddenly spikes to 2.00%, then surely this would be an interesting event if an additive was added that year (assume short-term toxic effects). Obviously there could be other explanations for the rise, but let's focus on statistical significance. Now, how about if it rises to 1.40%? Is that statistically significant? Where do you draw the line?

I'm starting to get the feeling that I need to "choose" a critical region, which feel less authoritative. Can a Gaussian distribution guide me on this? What other information do I need to determine statistical significance?

Best Answer

This question gets to the heart of statistical thinking by recognizing that both (a) "every experiment is different," implying no single "cookbook" recipe will suffice to assess experimental results in all cases and (b) "significance should depend on the deviations in measurements," pointing towards the importance of probability theory in modeling the deviations.

Unfortunately (a) indicates that a universal "simple formula" is not possible. However, some things can be said in general. These include

  • Deviations in measurements can be attributed partly to predictable phenomena as determined by properties of the subjects or the experiments. For example, weights of people depend (on average) partly on their gender. Deviations that are not predictable or determinate are usually modeled as random variables. How you analyze the deviations into these deterministic and random components is a probability model. Probability models can be as simple as textbook descriptions of dice and coins, but for realistic situations they can be quite complex.

  • Statistical "significance" measures the chance, when (hypothetically) the treatment has no effect, that some measure of difference between treatment and control groups would cause us to infer the groups are indeed different. This description is lengthy because so much is involved in its preparation: measuring the results, expressing the differences between the groups in some way (the test statistic), and selecting an inference procedure based on that statistic. Each of these things is under the control of the experimenter or observer and each is important.

A simple textbook example concerns a controlled experiment in which only two outcomes are possible for each subject, such as "dies" and "lives". With good experimental design--double-blinding, randomization of subjects, careful measurement, etc.--we can view the experimental outcomes behaving random draws of tickets from a box, where each ticket is labeled with one of the two outcomes. This is the probability model. The test statistic is usually the difference in proportions between the two groups (e.g., the difference between their mortality rates). Statistical theory, in the form of the Neyman-Pearson Lemma, tells us to base our determination of the experimental result on whether this difference exceeds some predetermined threshold. Probability theory allows us to analyze this tickets-in-a-box model to come up with an appropriate threshold (the test's critical region). The theory shows precisely how that threshold depends on the sizes of the control and treatment groups.

To go further, you need to learn some basic probability theory and see applications in some exemplary cases. This will accustom you to the habit of "thinking statistically" about everyday things. Two great resources are Smith & Gonick's Cartoon Guide to Statistics and the classic Freedman et al. textbook Statistics, used at Berkeley for several generations. (The link goes to an older edition which you can inspect online and buy used for almost nothing.)