This is not an easy thing, even for respected statisticians. Look at one recent attempt by Nate Silver:
... if I asked you to tell me how often your commute takes 10 minutes longer than average — something that requires some version of a confidence interval — you’d have to think about that a little bit, ...
(from the FiveThirtyEight blog in the New York Times, 9/29/10.) This is not a confidence interval. Depending on how you interpret it, it's either a tolerance interval or a prediction interval. (Otherwise there's nothing the matter with Mr. Silver's excellent discussion of estimating probabilities; it's a good read.) Many other web sites (particularly those with an investment focus) similarly confuse confidence intervals with other kinds of intervals.
The New York Times has made efforts to clarify the meaning of the statistical results it produces and reports on. The fine print beneath many polls includes something like this:
In theory, in 19 cases out of 20, results based on such samples of all adults will differ by no more than three percentage points in either direction from what would have been obtained by seeking to interview all American adults.
(e.g., How the Poll Was Conducted, 5/2/2011.)
A little wordy, perhaps, but clear and accurate: this statement characterizes the variability of the sampling distribution of the poll results. That's getting close to the idea of confidence interval, but it is not quite there. One might consider using such wording in place of confidence intervals in many cases, however.
When there is so much potential confusion on the internet, it is useful to turn to authoritative sources. One of my favorites is Freedman, Pisani, & Purves' time-honored text, Statistics. Now in its fourth edition, it has been used at universities for over 30 years and is notable for its clear, plain explanations and focus on classical "frequentist" methods. Let's see what it says about interpreting confidence intervals:
The confidence level of 95% says something about the sampling procedure...
[at p. 384; all quotations are from the third edition (1998)]. It continues,
If the sample had come out differently, the confidence interval would have been different. ... For about 95% of all samples, the interval ... covers the population percentage, and for the other 5% it does not.
[p. 384]. The text says much more about confidence intervals, but this is enough to help: its approach is to move the focus of discussion onto the sample, at once bringing rigor and clarity to the statements. We might therefore try the same thing in our own reporting. For instance, let's apply this approach to describing a confidence interval of [34%, 40%] around a reported percentage difference in a hypothetical experiment:
"This experiment used a randomly selected sample of subjects and a random selection of controls. We report a confidence interval from 34% to 40% for the difference. This quantifies the reliability of the experiment: if the selections of subjects and controls had been different, this confidence interval would change to reflect the results for the chosen subjects and controls. In 95% of such cases the confidence interval would include the true difference (between all subjects and all controls) and in the other 5% of cases it would not. Therefore it is likely--but not certain--that this confidence interval includes the true difference: that is, we believe the true difference is between 34% and 40%."
(This is my text, which surely can be improved: I invite editors to work on it.)
A long statement like this is somewhat unwieldy. In actual reports most of the context--random sampling, subjects and controls, possibility of variability--will already have been established, making half of the preceding statement unnecessary. When the report establishes that there is sampling variability and exhibits a probability model for the sample results, it is usually not difficult to explain a confidence interval (or other random interval) as clearly and rigorously as the audience needs.
The for addition/subtraction uncertainties add in quadrature. Therefore the combined confidence interval is:
$$ u_c = \sqrt{u_m^2+u_r^2+u_w^2}$$
For an individual test the confidence interval can be given by the standard deviation of the mean.
$$ s_m = \frac{s}{\sqrt{n}}$$
where $s$ is the standard deviation of the sample. This should then be multiplied by the coverage factor to get the desired confidence interval.
$$ u=ks_m$$
For a normal distribution at 95% confidence $k=1.96$ ($\simeq2$).
In matlab the easiest way to do it is to just calculate standard deviation (std()) or variance and go from there. I don't use Stata/SAS so don't know if they have any specific functions.
I would also think carefully about the meaning of your test. You should probably normalise the scores so they are all on the same scale. Additionally I suspect they may be some correlation between results for a single student taking all three tests. That is students are likely to good/bad in all the tests. If this is the case you would get a larger confidence interval if you take a sample of different students total score than for the total of different samples on each test (what you currently do).
(Aside on the general case)
In general for any function $f(x_1, x_2, ... , x_m)$ where all $x_i$ are independent of each other and have associated uncertainty, $u_i$, the combined uncertainty $u_c$ can be given by:
$$ u_c^2 = \sum_{i=1}^m\left(\frac{\partial f}{\partial x_i}^2\right)u_i^2$$
You should make sure all $u_i$ have the same coverage factor. You can calculate uncertainty $k=1$ and then expand the combined uncertainty if you need to.
Best Answer
The confidence interval provided by the OP (10.16, 12.01) is correct for the data provided. The SPSS output does not match this data, whether or not the population mean is subtracted. (t value incorrect, CI incorrect, p-value incorrect.) The output is either from a different example or there was some error in what data was passed to the function.
In R: