Paired Data Analysis – Regression or Paired t-test?

paired-dataregressiont-test

Question
I always used a paired t-test or a wilcoxon signed rank test (of course depending on the dataset) to check whether two methods (on average) yielded the same results. After learning more about regression I think that this would work with regression too, however I can't understand which would be "better" in which case?

Example
Let's take this (too small) example dataset, and assume that it's normally distributed.

data <- read.table(text = "  sample methodx methody
                   1      1    0.52    0.53
                   2      2    0.50    0.51
                   3      3    0.48    0.48
                   4      4    0.40    0.41
                   5      5    0.36    0.36
                   6      6    0.30    0.32
                   7      7    0.28    0.30
                   8      8    0.28    0.29", header = T)

# Regression analysis
model <- lm(data$methodx ~ data$methody)
summary(model)

# Residuals:
#   Min        1Q     Median        3Q       Max 
# -0.007317 -0.004931 -0.002012  0.004596  0.011341 
#
# Coefficients:
#             Estimate Std. Error    t value  Pr(>|t|)    
#  (Intercept)  -0.02341    0.01181  -1.983   0.0946 .  
#  data$methody  1.03354    0.02879  35.900   3.11e-08 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.007374 on 6 degrees of freedom
# Multiple R-squared:  0.9954,  Adjusted R-squared:  0.9946 
# F-statistic:  1289 on 1 and 6 DF,  p-value: 3.115e-08

# Paired t-test
t.test(data$methodx, data$methody, paired = TRUE)

# Paired t-test
#
# data:  data$methodx and data$methody
# t = -3.7417, df = 7, p-value = 0.007247
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -0.016319724 -0.003680276
# sample estimates:
#   mean of the differences 
#                     -0.01 

When looking at the regression: I see a high correlation (0.9954), which seems linear as the rc of the line is 1.03354. The paired t-test tells me to reject H0, likely because of the fact that this dataset is way too small. But on general both seem to be able to tell me whether the methods on average give the same results. So when to choose a linear regression and when to choose a paired t-test when comparing two methods?

Best Answer

Your t-test is answering the question you want, which is (in your own words) "check whether two methods (on average) yielded the same results", and on that side your analysis looks correct. This is simple, correct, and appropriate given your small sample size n=7.

Your regression model, however, is not set up to answer the same question, so it doesn't make sense to compare these two methods head-to-head. Let's first look at what the model you specify is actually doing.

The model given is just predicting methodx from methody. This model is always of the form methodx = slope * methody + Intercept.

plot of models

If H0 is true we'd expect slope to be 1 and Intercept to be 0. In fact, especially under H0, we would expect a very high correlation so the fact that you're seeing it tells us very little. Not only that, but correlation actually doesn't change at all if the Intercept changes! So I would completely ignore the $R^2$ numbers on the default regression analysis - they tell us nothing interesting about this specific problem.

To falsify H0, we have to make and defend one of the following statements:

  1. the intercept is significantly different than 0.
  2. the slope is significantly different than 1.
  3. the combination of slope and intercept is significantly different than 1/0.

Statement #1 can (almost) be read off the regression diagnosis: the t-value and p-values are for the hypothesis test that Intercept is different from zero. But notice that the t-value is much lower and the p-value is much higher than when you did the direct t-test. In fact, we have to understand the t-test of a single coefficient as controlling for all other variables in the models: in this case, slope. That's a fundamentally different question than the one we originally asked.

Statement #2 cannot be read directly off the regression diagnosis, because by default it's comparing the slope to 0, whereas you want to compare it to a slope of 1, because that's what the slope would be under your null hypothesis. You can can do this yourself as an exercise by subtracting one from the fitted slope coefficient, dividing by the given "Std. Error" for the slope parameter in the second column, and applying a t-test to the resulting t-statistic. However, we still have the problem of co-mingling both the slope and intercepts, and it's not the "difference in means" you're looking for, so I don't recommend this either.

For Statement #3, we want want to do both together. We can do this with ANOVA:

First, we build a (somewhat trivial) model with a forced slope of 1 and intercept 0. This model in fact has no free parameters, but R is happy to build an object of class lm for us, which what we want.

null_model <- lm(methodx ~ offset(1*methody) -1, data=data)

Then you can compare this to the model you fit above:

anova(model, null_model)

When I do this for your data, I get:

Analysis of Variance Table

Model 1: methodx ~ methody
Model 2: methodx ~ offset(1 * methody) - 1
  Res.Df        RSS Df   Sum of Sq      F  Pr(>F)  
1      6 0.00032622                                
2      8 0.00120000 -2 -0.00087378 8.0355 0.02009 *

So the omnibus p-value is 0.02. This is basically using a F-test under the hood and asking the question, "do I explain more additional RSS when using a more complicated model than would be explained by pure chance?"

You might want to use this kind of omnibus test if your not sure if the true slope relating methodx and methody is exactly 1. That's the only thing we gain with respect to simply applying a t-test to the difference of means between groups. Note that this means we're "taking credit" (in the sense of reporting a lower, more significant p-value) for any difference in slopes we observe. Depending on the experiment, this may actually be completely backwards - a slope different than 1 might indicate inconsistent methodology between the two measurements, for example.

Related Question