Solved – Meaning of 2.04 standard errors? Significantly different means when confidence intervals widely overlap

confidence intervalstandard error

The image below is from this article in Psychological Science. A colleague pointed out two unusual things about it:

  1. According to the caption, the error bars show "±2.04 standard errors, the 95% confidence interval." I've only ever seen ±1.96 SE used for the 95% CI, and I can't find anything about 2.04 SE being used for any purpose. Does 2.04 SE have some accepted meaning?
  2. The text states that planned pairwise comparisons found significant differences for mean startle magnitude in error vs. correct predictable trials (t(30)=2.51, p<.01) and error vs. correct unpredictable trials (t(30)=2.61, p<.01) (the omnibus F test was also significant at p<.05). However, the graph shows the error bars for all three conditions overlapping substantially. If the ±2.04 SE intervals overlap, how can the values be significantly different at p<.05? The overlap is large enough that I'm assuming that the ±1.96 SE intervals also overlap.

bar graph showing 2.04 SE error bars

Best Answer

  1. $2.04$ is the multiplier to use with a Student t distribution with 31 degrees of freedom. The quotations suggest $30$ degrees of freedom is appropriate, in which case the correct multiplier is $2.042272 \approx 2.04$.

  2. Means are compared in terms of standard errors. The standard error is typically $1/\sqrt{n}$ times the standard deviation, where $n$ (presumably around $30+1=31$ here) is the sample size. If the caption is correct in calling these bars the "standard errors," then the standard deviations must be at least $\sqrt{31} \approx 5.5$ times greater than the values of approximately $6$ as shown. A dataset of $31$ positive values with a standard deviation of $6 \times 5.5 = 33$ and a mean between $14$ and $18$ would have to have most values near $0$ and a small number of whopping big values, which seems quite unlikely. (If this were so, then the entire analysis based on Student t statistics would be invalid anyway.) We should conclude that the figure likely shows standard deviations, not standard errors.

  3. Comparisons of means are not based on overlap (or lack thereof) of confidence intervals. Two 95% CIs can overlap, yet can still indicate highly significant differences. The reason is that the standard error of the difference in (independent) means is, at least approximately, the square root of the sum of squares of the standard errors of the means. For example, if the standard error of a mean of $14$ equals $1$ and the standard error of a mean of $17$ equals $1$, then the CI of the first mean (using a multiple of $2.04$) will extend from $11.92$ to $16.08$ and the CI of the second will extend from $14.92$ to $19.03$, with substantial overlap. Nevertheless the SE of the difference will equal $\sqrt{1^2+1^2}\approx 1.41$. The difference of means, $17-14=3$, is greater than $2.04$ times this value: it is significant.

  4. These are pairwise comparisons. The individual values can exhibit a lot of variability while their differences might be highly consistent. For instance, a set of pairs like $(14,14.01)$, $(15,15.01)$, $(16,16.01)$, $(17,17.01)$, etc., exhibits variation in each component, but the differences are consistently $0.01$. Although this difference is small compared to either component, its consistency shows it is statistically significant.

Related Question