R – Why Do Linear Regression and ANOVA Give Different p-Values for Interaction Terms?

anovap-valuerregressionstatistical significance

I was trying to fit one time-series data (without replicates) using regression model.
The data looks like follows:

> xx.2
          value time treat
    1  8.788269    1     0
    2  7.964719    6     0
    3  8.204051   12     0
    4  9.041368   24     0
    5  8.181555   48     0
    6  8.041419   96     0
    7  7.992336  144     0
    8  7.948658    1     1
    9  8.090211    6     1
    10 8.031459   12     1
    11 8.118308   24     1
    12 7.699051   48     1
    13 7.537120   96     1
    14 7.268570  144     1

Because of lack of replicates, I treat the time as continuous variable. Column "treat" shows the case and control data, respectively.

First, I fit the the model "value = time*treat" with "lm" in R:

summary(lm(value~time*treat,data=xx.2))

Call:
lm(formula = value ~ time * treat, data = xx.2)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50627 -0.12345  0.00296  0.04124  0.63785 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.493476   0.156345  54.325 1.08e-13 ***
time        -0.003748   0.002277  -1.646   0.1307    
treat       -0.411271   0.221106  -1.860   0.0925 .  
time:treat  -0.001938   0.003220  -0.602   0.5606

The pvalue of time and treat is not significant.

While with anova, I got different results:

 summary(aov(value~time*treat,data=xx.2))
            Df Sum Sq Mean Sq F value Pr(>F)  
time         1 0.7726  0.7726   8.586 0.0150 *
treat        1 0.8852  0.8852   9.837 0.0106 *
time:treat   1 0.0326  0.0326   0.362 0.5606  
Residuals   10 0.8998  0.0900

The pvalue for time and treat changed.

With linear regression, if I am right, it means the time and treat has no significant influence on value, but with ANOVA, it means time and treat has significant influence on value.

Could someone explain to me why there is difference in these two methods, and which one to use?

Best Answer

The fit for lm() and aov() are identical but the reporting is different. The t tests are the marginal impact of the variables in question, given the presence of all the other variables. The F tests are sequential - so they test for the importance of time in the presence of nothing but the intercept, of treat in the presence of nothing but the intercept and time, and of the interaction in the presence of all the above.

Assuming you are interested in the significance of treat, I suggest you fit two models, one with, and one without, compare the two by putting both models in anova(), and use that F test. This will test treat and the interaction simultaneously.

Consider the following:

> xx.2 <- as.data.frame(matrix(c(8.788269, 1, 0,
+ 7.964719, 6, 0,
+ 8.204051, 12, 0,
+ 9.041368, 24, 0,
+ 8.181555, 48, 0,
+ 8.041419, 96, 0,
+ 7.992336, 144, 0,
+ 7.948658, 1, 1,
+ 8.090211, 6, 1,
+ 8.031459, 12, 1,
+ 8.118308, 24, 1,
+ 7.699051, 48, 1,
+ 7.537120, 96, 1,
+ 7.268570, 144, 1), byrow=T, ncol=3))
> names(xx.2) <- c("value", "time", "treat")
> 
> mod1 <- lm(value~time*treat, data=xx.2)
> anova(mod1)
Analysis of Variance Table

Response: value
           Df  Sum Sq Mean Sq F value  Pr(>F)  
time        1 0.77259 0.77259  8.5858 0.01504 *
treat       1 0.88520 0.88520  9.8372 0.01057 *
time:treat  1 0.03260 0.03260  0.3623 0.56064  
Residuals  10 0.89985 0.08998                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> mod2 <- aov(value~time*treat, data=xx.2)
> anova(mod2)
Analysis of Variance Table

Response: value
           Df  Sum Sq Mean Sq F value  Pr(>F)  
time        1 0.77259 0.77259  8.5858 0.01504 *
treat       1 0.88520 0.88520  9.8372 0.01057 *
time:treat  1 0.03260 0.03260  0.3623 0.56064  
Residuals  10 0.89985 0.08998                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> summary(mod2)
            Df Sum Sq Mean Sq F value Pr(>F)  
time         1 0.7726  0.7726   8.586 0.0150 *
treat        1 0.8852  0.8852   9.837 0.0106 *
time:treat   1 0.0326  0.0326   0.362 0.5606  
Residuals   10 0.8998  0.0900                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> summary(mod1)

Call:
lm(formula = value ~ time * treat, data = xx.2)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50627 -0.12345  0.00296  0.04124  0.63785 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.493476   0.156345  54.325 1.08e-13 ***
time        -0.003748   0.002277  -1.646   0.1307    
treat       -0.411271   0.221106  -1.860   0.0925 .  
time:treat  -0.001938   0.003220  -0.602   0.5606    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.3 on 10 degrees of freedom
Multiple R-squared: 0.6526,     Adjusted R-squared: 0.5484 
F-statistic: 6.262 on 3 and 10 DF,  p-value: 0.01154

Related Solutions

ANOVA – Why ANOVA and Regression Give Opposite Results in R

Essentially, the question is, how come that one coefficient in the linear model is significantly different from 0, but ANOVA shows no significant effect and vice versa.

For this, let's consider a simpler example.

set.seed( 123 )
data <- data.frame( x= rnorm( 100 ), g= rep( letters[1:10], each= 10 ) )
data$x[ data$g == "d" ] <- data$x[ data$g == "d" ] + 0.5
boxplot( x ~ g, data )
l <- lm( x ~ 0 + g, data )
summary( l )
anova( l )

You can see that there is only one group (d) that stands out of the line (has a coefficient significantly different from zero). However, given that the nine other groups do not show an effect, the anova returns $p > 0.1$. However, let us remove some of the groups:

data2 <- data[ data$g %in% c( "a", "d" ), ]
anova( lm( x ~ 0 + g, data2 )

returns

          Df  Sum Sq Mean Sq F value  Pr(>F)  
g          2  6.8133  3.4066  5.7363 0.01182 *
Residuals 18 10.6898  0.5939

ANOVA considers the overall variance within and between the groups. In the first case (10 groups) the variance between the groups is smaller because of the many groups with no effect. In the second, there are only two groups, and all the between groups variance comes from the difference between these two groups.

How about the reverse? This is easier: imagine three groups with means equal to -1, 0, 1. Total average is 0. Each group separately does not necessarily has a significant difference from 0, but there is enough difference between group 1 and 3 to account for significant total between group variance.

Solved – Why do ANOVAs and GLM (negative binomial model) give different results for interaction effects

If you only have two categorical variables and you include the interaction between the two, then the nbreg and anova are equivalent in the sense that they produce exactly the same predicted values. As a consequence both models are "equally right" or "equally wrong".

sysuse nlsw88, clear

gen byte black = race == 2 if race < 3

anova wage black##union
predict yhat_anova

nbreg wage black##union
predict yhat_nbreg

tab yhat_anova yhat_nbreg

    Fitted |         Predicted number of events
    values |  5.968046     7.5821   8.614504   8.735929 |     Total
-----------+--------------------------------------------+----------
  5.968046 |       350          0          0          0 |       350 
    7.5821 |         0      1,051          0          0 |     1,051 
  8.614504 |         0          0        151          0 |       151 
  8.735929 |         0          0          0        302 |       302 
-----------+--------------------------------------------+----------
     Total |       350      1,051        151        302 |     1,854

This is not just true for a model with two categorical variables and their interaction; It is true for any fully saturated model, i.e. any model that inlcudes only categorical variables and all interactions including all higher order interactions.

Edit:

In order to see why these models are just different ways of saying the same thing it is easiest to go back at the basics. Consider the example above: Both models model the mean outcome, in this case mean wage. In both models there are only 4 means to model: black non-union, black union, white non-union white union. Both models model these 4 means with 4 parameters (constant, main effect for union, main effect for black, and the interaction term). As a consequence both models impose no constraint on these means.

Lets start with just looking at these means:

table  union black, c(mean wage)

------------------------------
union     |       black       
worker    |        0         1
----------+-------------------
 nonunion |   7.5821  5.968046
    union | 8.735929  8.614504
------------------------------

So, among non-union member there is a noticable difference in mean wage between black and white persons, while among union members this difference is a lot smaller.

Lets look at the regression output:

. reg wage i.union##i.black

      Source |       SS       df       MS              Number of obs =    1854
-------------+------------------------------           F(  3,  1850) =   29.83
       Model |  1472.83336     3  490.944452           Prob > F      =  0.0000
    Residual |  30445.6086  1850  16.4570858           R-squared     =  0.0461
-------------+------------------------------           Adj R-squared =  0.0446
       Total |   31918.442  1853   17.225279           Root MSE      =  4.0567

------------------------------------------------------------------------------
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] 
-------------+----------------------------------------------------------------
       union |
      union  |   1.153829   .2648625     4.36   0.000     .6343677    1.673289
     1.black |  -1.614053   .2503572    -6.45   0.000    -2.105066   -1.123041
             |
 union#black |
    union#1  |   1.492629   .4755625     3.14   0.002     .5599329    2.425324
             |
       _cons |     7.5821   .1251339    60.59   0.000     7.336681    7.827518
------------------------------------------------------------------------------

You can see that the mean wage for non-union white persons is the constant. This mean is 1.61 dollars/hour less for black persons. So the wage for black persons is:

di _b[_cons]+_b[1.black]
5.9680463

Which corresponds with the mean reported in the tabel above. The negative effect of being black is reduced by 1.48 dollars/hour, i.e. black union members earn

di _b[1.black] + _b[1.union#1.black]
-.12142489

dollars/hour less than white union members, a number you can also recover from the table of means above.

Consider a multiplicative model, in this case a Poisson with robust standard errors (for a justification of that choice, see here.

. poisson wage i.union##i.black, irr vce(robust)
note: you are responsible for interpretation of noncount dep. variable

Iteration 0:   log pseudolikelihood = -5239.2159  
Iteration 1:   log pseudolikelihood = -5239.2159  

Poisson regression                                Number of obs   =       1854
                                                  Wald chi2(3)    =     106.56
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -5239.2159                 Pseudo R2       =     0.0188

------------------------------------------------------------------------------
             |               Robust
        wage |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       union |
      union  |   1.152178   .0360531     4.53   0.000     1.083638    1.225053
     1.black |   .7871232   .0265286    -7.10   0.000     .7368083     .840874
             |
 union#black |
    union#1  |   1.252791   .0759936     3.72   0.000     1.112359    1.410951
             |
       _cons |     7.5821   .1308473   117.39   0.000     7.329932    7.842942
------------------------------------------------------------------------------

Again we can see that the mean wage of white non-union members is 7.5821. Black non-union persons earn $(0.787 - 1) \times 100\% = -21\%$ less then white non-union persons, a number you can recover from the table of means: $( 5.968046 - 7.5821 ) / 7.5821*100 = -21\%$. This negative effect of being black increases, which means becomes less negative, by 25% if someone is a union member. So, the effect of being black for union members is $1.25\times.787=.98$ or black union members earn 2\% less than white union members. A number you could recover from the table of means or the regression output, both of which I will leave as an exercise to the reader.

Best Answer

Related Solutions

ANOVA – Why ANOVA and Regression Give Opposite Results in R

Solved – Why do ANOVAs and GLM (negative binomial model) give different results for interaction effects

Related Question