Solved – Difference in Means vs. Mean Difference

meanpaired-comparisonspaired-data

When studying two independent samples means, we are told we are looking at the "difference of two means". This means we take the mean from population 1 ($\bar y_1$) and subtract from it the mean from population 2 ($\bar y_2$). So, our "difference of two means" is ($\bar y_1$ – $\bar y_2$).

When studying paired samples means, we are told we are looking at the "mean difference", $\bar d$. This is calculated by taking the difference between each pair, and then taking the mean of all those differences.

My question is: Do we get the same ($\bar y_1$ – $\bar y_2$) versus its $\bar d$ if we calculated them from two columns of data, and the first time considered it two independent samples, and the second time considered it paired data? I have played around with two columns of data, and it seems that the values are the same! In that case, can it be said that the different names are used for just non-quantitative reasons?

Best Answer

(I'm assuming you mean "sample" and not "population" in your first paragraph.)

The equivalence is easy to show mathematically. Start with two samples of equal size, $\{x_1,\dots,x_n\}$ and $\{y_1,\dots,y_n\}$. Then define $$\begin{align} \bar x &= \frac{1}{n} \sum_{i=1}^n x_i \\ \bar y &= \frac{1}{n} \sum_{i=1}^n y_i \\ \bar d &= \frac{1}{n} \sum_{i=1}^n x_i - y_i \end{align}$$

Then you have: $$\begin{align} \bar x - \bar y &= \left( \frac{1}{n} \sum_{i=1}^n x_i \right) - \left( \frac{1}{n} \sum_{i=1}^n y_i \right) \\ &= \frac{1}{n} \left( \sum_{i=1}^n x_i - \sum_{i=1}^n y_i \right) \\ &= \frac{1}{n} \left( \left( x_1 + \dots + x_n \right) - \left( y_1 + \dots + y_n \right) \right) \\ &= \frac{1}{n} \left( x_1 + \dots + x_n - y_1 - \dots - y_n \right) \\ &= \frac{1}{n} \left( x_1 - y_1 + \dots + x_n - y_n \right) \\ &= \frac{1}{n} \left( \left( x_1 - y_1 \right) + \dots + \left( x_n - y_n \right) \right) \\ &= \frac{1}{n} \sum_{i = 1}^n x_i - y_i \\ &= \bar d. \end{align}$$