Solved – Why is standard error sometimes used for “error bands” in plots

confidence intervaldata visualizationinterpretationstandard errorstatistical significance

It seems that often what someone really wants to plot is a confidence interval of some kind, but using SE for this purpose I think only ends up comprising something like a 68% confidence band. Therefore, plotting SE for error bars instead of a wider band more representative of the significance level of your analysis visually suggests significance in your data that may not actually be there.

Consider the following concrete example:

set.seed(123)
X <- rnorm(100, 0, 1)
Y <- rnorm(100,1.7,5)
df = data.frame(X,Y)

boxplot(df)

se.x = sd(X)/sqrt(length(X))
se.y = sd(Y)/sqrt(length(Y))

X.err.CI = 1.96*se.x
Y.err.CI = 1.96*se.y


plot(1:2, colMeans(df), ylim=c(-1,3), xlim = c(0.5,4.5), col="dark green"
     , main="Comparison of SE bars vs 95% CI")
lines(c(1,1), c(mean(X) + X.err.CI, mean(X) - X.err.CI), col="dark green")
lines(c(2,2), c(mean(Y) + Y.err.CI, mean(Y) - Y.err.CI), col="dark green")
text(1:2 + .2, colMeans(df), c("X","Y"))

points(3:4, colMeans(df), col="blue")
lines(c(3,3), c(mean(X) + se.x, mean(X) - se.x), col="blue")
lines(c(4,4), c(mean(Y) + se.y, mean(Y) - se.y), col="blue")
text(3:4 + .2, colMeans(df), c("X","Y"))

abline(v=2.5, lty=2)

legend("topright"
       ,c("95% CI", "+/- SE")
       ,lty=c(1,1)
       ,pch=c(1,1)
       ,col=c("dark green", "blue")
       )

enter image description here

If we just base our analysis on SE (the image on the right), visually it appears that there is significance between the means of X and Y because we don't have overlap in our error bars. But if we're testing at a 5% significance level, plotting the 95% confidence bands shows that this is clearly not the case.

Since we can expect that a test at the 32% level will never be appropriate, why even show the SE bars since they will probably be interpreted as though they represent a confidence interval? Do people use SE bars instead of more meaningful CIs because it's moderately easier to calculate (e.g. using a built-in function in Excel)? It seems that we're paying a pretty high cost in terms of the interpretability of our graphic in exchange for a few minutes' less work. Is there some value/utility in SE bars that I'm missing?

For context, I was prompted to write this after skimming this article. I was frustrated by the lack of confidence intervals in the plots provided by the authors, and then when they did finally provide them, it turned out they were just SE bars.

Best Answer

Mostly its that "it's been done that way in the past", but in some domains it is precisely because the authors are not drawing statistical inferences directly from the reported standard errors (even though, for the example paper it might be reasonable to do so).

As an example, physics research papers often depict the standard errors related to (estimated) statistical errors in the data collection. These are usually estimated from running (as much a possible) the same experimental multiple times using the same setup and estimating the variance. However, these statistical errors are only very rarely used in a direct confidence interval/degree of significance type of assessment. This is due to the fact that in most experiments systematic errors of various type are likely to be larger than the statistical errors, and these types of errors are not amenable to statistical analysis. Thus, representing the 95% confidence interval based on just the statistical errors could be deceiving. Experimental particle physicists in particular go to great pains to identify statistical uncertainties, systematic uncertainties and then combine them (in physics community approved ways) into confidence intervals (the preprints on the discovery of the Higgs boson are probably easily found examples of this).