Solved – Why does a paired t-test (when appropriate) result in better variance

hypothesis testingpaired-comparisonst-testvariance

I didn't quite get the Wikipedia explanation here:

http://en.wikipedia.org/wiki/Paired_difference_test#Use_in_reducing_variance

I agree that both the unpaired and paired means are the same…then I see how

$\text{var}(\bar{Y_2} – \bar{Y_1})$

will include a covariance term..but what is the alternative variance? Take the paired differences first, and then the variance of the result? And how would that be any different?

Best Answer

Let's say we have the two conditions in Table 1. Each condition has a variance of 4 yielding a pooled variance of 4 as well and we double that to get the variance of the effect, 8. What if they were actually paired values and qualify for a paired t-test? We take the variance of the differences, the variance of the actual effect, which can be seen from the table to be 0 because they're all equal. This is the kind of thing that can happen when you have a paired test and how it can be more sensitive with a smaller standard error.

Table 1.

A1  A2  A1-A2
11   5   6
13   7   6
15   9   6
var(A1) = 4
var(A2) = 4
var(A1-A2) = 0

Related Solutions

Solved – Paired versus unpaired t-test

I agree with the points that both Frank and Peter make but I think there is a simple formula that gets to the heart of the issue and may be worthwhile for the OP to consider.

Let $X$ and $Y$ be two random variables whose correlation is unknown.

Let $Z=X-Y$

What is the variance of $Z$?

Here is the simple formula: $$ \text{Var}(Z)=\text{Var}(X) + \text{Var}(Y) - 2 \text{Cov}(X,Y). $$ What if $\text{Cov}(X,Y)>0$ (i.e., $X$ and $Y$ are positively correlated)?

Then $\text{Var}(Z)\lt \text{Var}(X)+\text{Var}(Y)$. In this case if the pairing is made because of positive correlation such as when you are dealing with the same subject before and after intervention pairing helps because the independent paired difference has lower variance than the variance you get for the unpaired case. The method reduced variance. The test is more powerful. This can be dramatically shown with cyclic data. I saw an example in a book where they wanted to see if the temperature in Washington DC is higher than in New York City. So they took average monthly temperature in both cities for say 2 years. Of course there is a huge difference over the course of the year because of the four seasons. This variation is too large for an unpaired t test to detect a difference. However pairing based on the same month in the same year eliminates this seasonal effect and the paired $t$-test clearly showed that the average temperature in DC tended to be higher than in New York. $X_i$ (temperature at NY in month $A$) and $Y_i$ (temperature in DC in month $A$) are positively correlated because the seasons are the same in NY and DC and the cities are close enough that they will often experience the same weather systems that affect temperature. DC may be a little warmer because it is further south.

Note that the large the covariance or correlation the greater is the reduction in variance.

Now suppose $\text{Cov}(X,Y)$ is negative.

Then $\text{Var}(Z) \gt \text{Var}(X)+\text{Var}(Y)$. Now pairing will be worse than not pairing because the variance is actually increased!

When $X$ and $Y$ are uncorrelated then it probably doesn't matter which method you use. Peter's random pairing case is like this situation.

Solved – Why does $r^2$ between two variables represent proportion of shared variance

One can only guess what one particular author might mean by "shared variance." We might hope to circumscribe the possibilities by considering what properties this concept ought (intuitively) to have. We know that "variances add": the variance of a sum $X+\varepsilon$ is the sum of the variances of $X$ and $\varepsilon$ when $X$ and $\varepsilon$ have zero covariance. It is natural to define the "shared variance" of $X$ with the sum to be the fraction of the variance of the sum represented by the variance of $X$. This is enough to imply the shared variances of any two random variables $X$ and $Y$ must be the square of their correlation coefficient.

This result gives meaning to the interpretation of a squared correlation coefficient as a "shared variance": in a suitable sense, it really is a fraction of a total variance that can be assigned to one variable in the sum.

The details follow.

Principles and their implications

Of course if $Y=X$, their "shared variance" (let's call it "SV" from now on) ought to be 100%. But what if $Y$ and $X$ are just scaled or shifted versions of one another? For instance, what if $Y$ represents the temperature of a city in degrees F and $X$ represents the temperature in degrees C? I would like to suggest that in such cases $X$ and $Y$ should still have 100% SV, so that this concept will remain meaningful regardless of how $X$ and $Y$ might be measured:

$$\operatorname{SV}(\alpha + \beta X, \gamma + \delta Y) = \operatorname{SV}(X,Y)\tag{1}$$

for any numbers $\alpha, \gamma$ and nonzero numbers $\beta, \delta$.

Another principle might be that when $\varepsilon$ is a random variable independent of $X$, then the variance of $X+\varepsilon$ can be uniquely decomposed into two non-negative parts,

$$\operatorname{Var}(X+\varepsilon) = \operatorname{Var}(X) + \operatorname{Var}(\varepsilon),$$

suggesting we attempt to define SV in this special case as

$$\operatorname{SV}(X, X+\varepsilon) = \frac{\operatorname{Var}(X)}{\operatorname{Var}(X) + \operatorname{Var}(\epsilon)}.\tag{2}$$

Since all these criteria are only up to second order--they only involve the first and second moments of the variables in the forms of expectations and variances--let's relax the requirement that $X$ and $\varepsilon$ be independent and only demand that they be uncorrelated. This will make the analysis much more general than it otherwise might be.

The results

These principles--if you accept them--lead to a unique, familiar, interpretable concept. The trick will be to reduce the general case to the special case of a sum, where we can apply definition $(2)$.

Given $(X,Y)$, we simply attempt to decompose $Y$ into a scaled, shifted version of $X$ plus a variable that is uncorrelated with $X$: that is, let's find (if it's possible) constants $\alpha$ and $\beta$ and a random variable $\epsilon$ for which

$$Y = \alpha + \beta X + \varepsilon\tag{3}$$

with $\operatorname{Cov}(X, \varepsilon)=0$. For the decomposition to have any chance of being unique, we should demand

$$\mathbb{E}[\varepsilon]=0$$

so that once $\beta $ is found, $\alpha$ is determined by

$$\alpha = \mathbb{E}[Y] - \beta\, \mathbb{E}[X].$$

This looks an awful lot like linear regression and indeed it is. The first principle says we may rescale $X$ and $Y$ to have unit variance (assuming they each have nonzero variance) and that when it is done, standard regression results assert the value of $\beta$ in $(3)$ is the correlation of $X$ and $Y$:

$$\beta = \rho(X,Y)\tag{4}.$$

Moreover, taking the variances of $(1)$ gives

$$1 = \operatorname{Var}(Y) = \beta^2 \operatorname{Var}(X) + \operatorname{Var}(\varepsilon) = \beta^2 + \operatorname{Var}(\varepsilon),$$

implying

$$\operatorname{Var}(\varepsilon) = 1-\beta^2 = 1-\rho^2.\tag{5}$$

Consequently

$$\eqalign{ \operatorname{SV}(X,Y) &= \operatorname{SV}(X, \alpha+\beta X + \varepsilon) &\text{(Model 3)}\\ &= \operatorname{SV}(\beta X, \beta X + \varepsilon) &\text{(Property 1)}\\ &= \frac{\operatorname{Var}(\beta X)}{\operatorname{Var}(\beta X) + \operatorname{Var}(\epsilon)} & \text{(Definition 2)}\\ &= \frac{\beta^2}{\beta^2 + (1-\beta^2)} = \beta^2 &\text{(Result 5)}\\ & = \rho^2 &\text{(Relation 4)}. }$$

Note that because the regression coefficient on $Y$ (when standardized to unit variance) is $\rho(Y,X)=\rho(X,Y)$, the "shared variance" itself is symmetric, justifying a terminology that suggests the order of $X$ and $Y$ does not matter:

$$\operatorname{SV}(X,Y) = \rho(X,Y)^2 = \rho(Y,X)^2 = \operatorname{SV}(Y,X).$$