You are totally correct in your assumption that error bars representing the standard error of the mean are totally inappropriate for within-subject designs. However, the question of overlapping error bars and significance is yet another topic, to which I will come back at the end of this commented reference list.
There is rich literature from Psychology on within-subject confidence intervals or error bars which do exactly what you want. The reference work is clearly:
Loftus, G. R., & Masson, M. E. J. (1994). Using confidence intervals in within-subject designs. Psychonomic Bulletin & Review, 1(4), 476–490. doi:10.3758/BF03210951
However, their problem is that they use the same error term for all levels of a within-subject factor. This does not seem to be a huge problem for your case (2 levels). But there are more modern approaches solving this problem. Most notably:
Franz, V., & Loftus, G. (2012). Standard errors and confidence intervals in within-subjects designs: Generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts. Psychonomic Bulletin & Review, 1–10. doi:10.3758/s13423-012-0230-1
Baguley, T. (2011). Calculating and graphing within-subject confidence intervals for ANOVA. Behavior Research Methods. doi:10.3758/s13428-011-0123-7 [can be found here]
Further references can be found in the latter two papers (which I think are both worth a read).
How do researchers interpret CIs? Bad according to the following paper:
Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers Misunderstand Confidence Intervals and Standard Error Bars. Psychological Methods, 10(4), 389–396. doi:10.1037/1082-989X.10.4.389
How should we interpret overlapping and non-overlapping CIs?
Cumming, G., & Finch, S. (2005). Inference by Eye: Confidence Intervals and How to Read Pictures of Data. American Psychologist, 60(2), 170–180. doi:10.1037/0003-066X.60.2.170
One final thought (although this is not relevant to your case): If you have a split-plot design (i.e., within- and between-subject factors) in one plot, you can forget about error bars all together. I would (humbly) recommend my raw.means.plot
function in the R package plotrix
.
Mostly its that "it's been done that way in the past", but in some domains it is precisely because the authors are not drawing statistical inferences directly from the reported standard errors (even though, for the example paper
it might be reasonable to do so).
As an example, physics research papers often depict the standard errors related to (estimated) statistical errors in the data collection. These are usually estimated from
running (as much a possible) the same experimental multiple times using the same setup and estimating the variance. However, these statistical errors
are only very rarely used in a direct confidence interval/degree of significance
type of assessment. This is due to the fact that in most experiments systematic
errors of various type are likely to be larger than the statistical errors, and these types of errors are not amenable to statistical analysis. Thus, representing the 95% confidence interval based on just the statistical errors could be deceiving. Experimental particle physicists in particular go to great pains to identify statistical uncertainties, systematic uncertainties and then combine them (in physics community approved ways) into confidence intervals (the preprints on the discovery of the Higgs boson are probably easily found examples of this).
Best Answer
According to your updated question, the claim of @onestop is still valid: it's not ok to call them standard errors. Furthermore, the method seems strange and non-standard at all. What really was done in your case is to divide the population in two (values upper and lower than the mean) and calculate the standard error of THAT population, not of your real population and therefore, I find it personally strange to assign the length of the error bars in that way. Apparently the idea that was done here was taken from here. However, IMHO, the idea of dividing the sample and calculating an "upper and lower" standard deviation doesn't make much sense (or at least it botters me).
In physics (my area and apparently yours), however, it has been somewhat standard to show 68% confidence intervals for the sample median or the mean (depending on your choice of a location statistic; let's call this statistic $\bar{X}$ for the moment) in the following way for non-symmetric distributions (apparently emulating what would be a central credible interval): with your data points, you calculate $\bar{X}$ and then report an upper error bar of length $L_u$, where $L_u$ is calculated in order to satisfy $P(\bar{X}<\mu<\bar{X}+L_u)= 0.34$, where $\mu$ is the real (unknown) parameter. Then, for your lower error bar of length $L_l$, you repeat the same procedure but now downwards of the location statistic $\bar{X}$, i.e., $P(\bar{X}-L_l<\mu<\bar{X})= 0.34$. Of course, because the distribution of $\bar{X}$ is usually not known this is usually done with non-parametric methods (such as the Bootstrap or some variant of it).
As was also pointed out by @onestop, you can also obtain bayesian credible intervals, where you actually calculate the probability (density, in the continuous case) of your parameter given your data. Let's call this probability $p(x|D)$. The length of the lower error bar is now calculated in a more "natural way" (at least for me), in order to satisfy $P(\hat{x}-L_l<x<\hat{x}|D)=0.34$, and the length of the upper error bar is now calculated in order to satisfy $P(\hat{x}<x<\hat{x}+L_u|D)=0.34$, where $\hat{x}$ is your point estimate of the parameter (usually the median or even the mode).
All of the above, of course, makes sense only if your parameter is unimodal.