Standard Deviation – How to Compute It for Differences Between Two Data Sets

standard deviationstandard error

I am running an experiment where I collect two data sets and I wish to measure the difference between the two. The two data sets are independent, with unknown probability distribution, and may not always have the same length.

Calculating the mean difference is easy as pie, but i also want a measure of the standard deviation and I'm not sure how to go about doing that. At first I just got the std deviation of the second data set minus the average of the first set, but in retrospect, I'm not sure that is entirely correct.

Any advice out there?

As an example of the data:

3.98   4.39   4.09   4.31   3.81   3.67   3.94   3.90   4.39   3.60   3.99   3.53   3.82

vs

3.95   4.51   4.49   4.43   4.55   4.41   4.68   4.22   4.45   4.59   4.42

Edit:
I want to etch this in stone a bit more to explain why ttest would not be the answer I'm looking for:
All computations are done in Matlab.
note: var(one) = 3.61e-6, var(two) = 5.01e-06.
Using gui11aume's answer:

 std = 2.93e-3

when doing

 [h,p,ci,stats] =ttest2(one, two)
     stats.sd = 2.08e-3

realizing the variances are likely unequal, one should rewrite this to:

[h,p,ci,stats] =ttest2(one,two,0.05,'both','unequal')
    stats.sd =  [1.9e-3 2.23e-3]

once again, thank you for your time

Best Answer

If $X$ and $Y$ are independent, the variance of $X-Y$ is $Var(X) + Var(Y)$. So the variance of the difference of means is the sum of the variances of each mean. This variance is unknown, but you can estimate it easily by the sum of the estimated variances: $S_1^2/n_1 + S_2^2/n_2$.