Essentially, the question is, how come that one coefficient in the linear model is significantly different from 0, but ANOVA shows no significant effect and vice versa.
For this, let's consider a simpler example.
set.seed( 123 )
data <- data.frame( x= rnorm( 100 ), g= rep( letters[1:10], each= 10 ) )
data$x[ data$g == "d" ] <- data$x[ data$g == "d" ] + 0.5
boxplot( x ~ g, data )
l <- lm( x ~ 0 + g, data )
summary( l )
anova( l )
You can see that there is only one group (d) that stands out of the line (has a coefficient significantly different from zero). However, given that the nine other groups do not show an effect, the anova returns $p > 0.1$. However, let us remove some of the groups:
data2 <- data[ data$g %in% c( "a", "d" ), ]
anova( lm( x ~ 0 + g, data2 )
returns
Df Sum Sq Mean Sq F value Pr(>F)
g 2 6.8133 3.4066 5.7363 0.01182 *
Residuals 18 10.6898 0.5939
ANOVA considers the overall variance within and between the groups. In the first case (10 groups) the variance between the groups is smaller because of the many groups with no effect. In the second, there are only two groups, and all the between groups variance comes from the difference between these two groups.
How about the reverse? This is easier: imagine three groups with means equal to -1, 0, 1. Total average is 0. Each group separately does not necessarily has a significant difference from 0, but there is enough difference between group 1 and 3 to account for significant total between group variance.
If you only have two categorical variables and you include the interaction between the two, then the nbreg
and anova
are equivalent in the sense that they produce exactly the same predicted values. As a consequence both models are "equally right" or "equally wrong".
sysuse nlsw88, clear
gen byte black = race == 2 if race < 3
anova wage black##union
predict yhat_anova
nbreg wage black##union
predict yhat_nbreg
tab yhat_anova yhat_nbreg
Fitted | Predicted number of events
values | 5.968046 7.5821 8.614504 8.735929 | Total
-----------+--------------------------------------------+----------
5.968046 | 350 0 0 0 | 350
7.5821 | 0 1,051 0 0 | 1,051
8.614504 | 0 0 151 0 | 151
8.735929 | 0 0 0 302 | 302
-----------+--------------------------------------------+----------
Total | 350 1,051 151 302 | 1,854
This is not just true for a model with two categorical variables and their interaction; It is true for any fully saturated model, i.e. any model that inlcudes only categorical variables and all interactions including all higher order interactions.
Edit:
In order to see why these models are just different ways of saying the same thing it is easiest to go back at the basics. Consider the example above: Both models model the mean outcome, in this case mean wage. In both models there are only 4 means to model: black non-union, black union, white non-union white union. Both models model these 4 means with 4 parameters (constant, main effect for union, main effect for black, and the interaction term). As a consequence both models impose no constraint on these means.
Lets start with just looking at these means:
table union black, c(mean wage)
------------------------------
union | black
worker | 0 1
----------+-------------------
nonunion | 7.5821 5.968046
union | 8.735929 8.614504
------------------------------
So, among non-union member there is a noticable difference in mean wage between black and white persons, while among union members this difference is a lot smaller.
Lets look at the regression output:
. reg wage i.union##i.black
Source | SS df MS Number of obs = 1854
-------------+------------------------------ F( 3, 1850) = 29.83
Model | 1472.83336 3 490.944452 Prob > F = 0.0000
Residual | 30445.6086 1850 16.4570858 R-squared = 0.0461
-------------+------------------------------ Adj R-squared = 0.0446
Total | 31918.442 1853 17.225279 Root MSE = 4.0567
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
union |
union | 1.153829 .2648625 4.36 0.000 .6343677 1.673289
1.black | -1.614053 .2503572 -6.45 0.000 -2.105066 -1.123041
|
union#black |
union#1 | 1.492629 .4755625 3.14 0.002 .5599329 2.425324
|
_cons | 7.5821 .1251339 60.59 0.000 7.336681 7.827518
------------------------------------------------------------------------------
You can see that the mean wage for non-union white persons is the constant. This mean is 1.61 dollars/hour less for black persons. So the wage for black persons is:
di _b[_cons]+_b[1.black]
5.9680463
Which corresponds with the mean reported in the tabel above. The negative effect of being black is reduced by 1.48 dollars/hour, i.e. black union members earn
di _b[1.black] + _b[1.union#1.black]
-.12142489
dollars/hour less than white union members, a number you can also recover from the table of means above.
Consider a multiplicative model, in this case a Poisson with robust standard errors (for a justification of that choice, see here.
. poisson wage i.union##i.black, irr vce(robust)
note: you are responsible for interpretation of noncount dep. variable
Iteration 0: log pseudolikelihood = -5239.2159
Iteration 1: log pseudolikelihood = -5239.2159
Poisson regression Number of obs = 1854
Wald chi2(3) = 106.56
Prob > chi2 = 0.0000
Log pseudolikelihood = -5239.2159 Pseudo R2 = 0.0188
------------------------------------------------------------------------------
| Robust
wage | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
union |
union | 1.152178 .0360531 4.53 0.000 1.083638 1.225053
1.black | .7871232 .0265286 -7.10 0.000 .7368083 .840874
|
union#black |
union#1 | 1.252791 .0759936 3.72 0.000 1.112359 1.410951
|
_cons | 7.5821 .1308473 117.39 0.000 7.329932 7.842942
------------------------------------------------------------------------------
Again we can see that the mean wage of white non-union members is 7.5821. Black non-union persons earn $(0.787 - 1) \times 100\% = -21\%$ less then white non-union persons, a number you can recover from the table of means: $( 5.968046 - 7.5821 ) / 7.5821*100 = -21\%$. This negative effect of being black increases, which means becomes less negative, by 25% if someone is a union member. So, the effect of being black for union members is $1.25\times.787=.98$ or black union members earn 2\% less than white union members. A number you could recover from the table of means or the regression output, both of which I will leave as an exercise to the reader.
Best Answer
The fit for lm() and aov() are identical but the reporting is different. The t tests are the marginal impact of the variables in question, given the presence of all the other variables. The F tests are sequential - so they test for the importance of time in the presence of nothing but the intercept, of treat in the presence of nothing but the intercept and time, and of the interaction in the presence of all the above.
Assuming you are interested in the significance of treat, I suggest you fit two models, one with, and one without, compare the two by putting both models in anova(), and use that F test. This will test treat and the interaction simultaneously.
Consider the following: