Solved – Difference in Means vs. Mean Difference

meanpaired-comparisonspaired-data

When studying two independent samples means, we are told we are looking at the "difference of two means". This means we take the mean from population 1 ($\bar y_1$) and subtract from it the mean from population 2 ($\bar y_2$). So, our "difference of two means" is ($\bar y_1$ – $\bar y_2$).

When studying paired samples means, we are told we are looking at the "mean difference", $\bar d$. This is calculated by taking the difference between each pair, and then taking the mean of all those differences.

My question is: Do we get the same ($\bar y_1$ – $\bar y_2$) versus its $\bar d$ if we calculated them from two columns of data, and the first time considered it two independent samples, and the second time considered it paired data? I have played around with two columns of data, and it seems that the values are the same! In that case, can it be said that the different names are used for just non-quantitative reasons?

Best Answer

(I'm assuming you mean "sample" and not "population" in your first paragraph.)

The equivalence is easy to show mathematically. Start with two samples of equal size, $\{x_1,\dots,x_n\}$ and $\{y_1,\dots,y_n\}$. Then define $$\begin{align} \bar x &= \frac{1}{n} \sum_{i=1}^n x_i \\ \bar y &= \frac{1}{n} \sum_{i=1}^n y_i \\ \bar d &= \frac{1}{n} \sum_{i=1}^n x_i - y_i \end{align}$$

Then you have: $$\begin{align} \bar x - \bar y &= \left( \frac{1}{n} \sum_{i=1}^n x_i \right) - \left( \frac{1}{n} \sum_{i=1}^n y_i \right) \\ &= \frac{1}{n} \left( \sum_{i=1}^n x_i - \sum_{i=1}^n y_i \right) \\ &= \frac{1}{n} \left( \left( x_1 + \dots + x_n \right) - \left( y_1 + \dots + y_n \right) \right) \\ &= \frac{1}{n} \left( x_1 + \dots + x_n - y_1 - \dots - y_n \right) \\ &= \frac{1}{n} \left( x_1 - y_1 + \dots + x_n - y_n \right) \\ &= \frac{1}{n} \left( \left( x_1 - y_1 \right) + \dots + \left( x_n - y_n \right) \right) \\ &= \frac{1}{n} \sum_{i = 1}^n x_i - y_i \\ &= \bar d. \end{align}$$

Related Solutions

Solved – Formula confidence interval for difference in means – one sample t-test

The confidence interval provided by the OP (10.16, 12.01) is correct for the data provided. The SPSS output does not match this data, whether or not the population mean is subtracted. (t value incorrect, CI incorrect, p-value incorrect.) The output is either from a different example or there was some error in what data was passed to the function.

In R:

A = c(10, 12, 13, 11.5, 9, 11, 11.1, 11.9, 12.1, 9.3)

B = A - 11.5

t.test(A, mu=11.5)

   ### One Sample t-test
   ### data:  A
   ### t = -1.0013, df = 9, p-value = 0.3428
   ### alternative hypothesis: true mean is not equal to 11.5
   ### 95 percent confidence interval:
   ### 10.16374 12.01626
   ### sample estimates:
   ### mean of x 
   ###     11.09

t.test(B, mu=0)

   ### One Sample t-test
   ### data:  B
   ### t = -1.0013, df = 9, p-value = 0.3428
   ### alternative hypothesis: true mean is not equal to 0
   ### 95 percent confidence interval:
   ### -1.3362575  0.5162575
   ### sample estimates:
   ### mean of x 
   ###     -0.41

Solved – Why does a paired t-test (when appropriate) result in better variance

Let's say we have the two conditions in Table 1. Each condition has a variance of 4 yielding a pooled variance of 4 as well and we double that to get the variance of the effect, 8. What if they were actually paired values and qualify for a paired t-test? We take the variance of the differences, the variance of the actual effect, which can be seen from the table to be 0 because they're all equal. This is the kind of thing that can happen when you have a paired test and how it can be more sensitive with a smaller standard error.

Table 1.

A1  A2  A1-A2
11   5   6
13   7   6
15   9   6
var(A1) = 4
var(A2) = 4
var(A1-A2) = 0

Best Answer

Related Solutions

Solved – Formula confidence interval for difference in means – one sample t-test

Solved – Why does a paired t-test (when appropriate) result in better variance

Related Question