Solved – Understanding what a factor is in a model

lme4-nlmemixed model

I have a model which I built with a number of factors as fixed effect variables. Up to now they all had two values e.g. high tide/low tide and so when I ran the summary it would show one variance for the variable (but it had a 2 next to those coded as an as.factor). After suddenly running a fixed effect with four IDs (Tide level) I had a duh moment and realised that the value shown is the effect for the second ID and now of course the third and fourth. The issue is I wanted to interpret the results as shown in this results table:

Table 3. Model-averaged parameter estimates and relative importance
values for variables affecting adult piping plover foraging rates in New Jersey,
2007–2009.
Parameter         Estimate          95% CI
Intercept           11.78         10.07 13.49
Habitat
    Intertidal      3.97          2.45   5.49
    Wrack           1.37          0.46   3.20
    Ephemeral pool  2.65          4.62   9.92
    Tidal pond      5.52          3.84   7.20
    Bay shore       2.32          0.03   4.61
    Sand flat       2.30          4.34   0.26
Tidal stage
    Low             3.98          3.05   4.91
    High            1.62          1.36   4.60
Wind speed          0.01          0.02   0.04

Note that it states that it shows the relative importance values too but I don't see them. Obviously they had run a model something like this Foraging Rate~Habitat+Tide Level+Wind Speed + (1|Site)
The major issue is how do I get an effect value for the first category within a variable (i.e. 'Tide 1') (and Low tide or Intertidal in the above example)
I can give you an example of my results here:

> testm1<-lmer(Feeding~Age+Tide+mean.catch.rate+mean.for.rate+(1|Brood), data=ABMnoD, REML=FALSE)
> testm1
Linear mixed model fit by maximum likelihood 
Formula: Feeding ~ Age + Tide + mean.catch.rate + mean.for.rate + (1 |      Brood) 
   Data: ABMnoD 
   AIC   BIC logLik deviance REMLdev
 350.3 366.1 -166.1    332.3   312.5
Random effects:
 Groups   Name        Variance Std.Dev.
 Brood    (Intercept)   0.00    0.00   
 Residual             132.93   11.53   
Number of obs: 43, groups: Brood, 7

Fixed effects:
                 Estimate Std. Error t value
(Intercept)      94.05421    5.76798  16.306
Age              -0.38108    0.31678  -1.203
Tide2             2.01871    5.38376   0.375
Tide3            -4.42228    5.34896  -0.827
Tide4           -13.03191    5.54832  -2.349
mean.catch.rate   0.88214    1.66752   0.529
mean.for.rate    -0.09334    1.13695  -0.082

Correlation of Fixed Effects:
            (Intr) Age    Tide2  Tide3  Tide4  mn.ct.
Age         -0.282                                   
Tide2       -0.433 -0.133                            
Tide3       -0.314 -0.189  0.514                     
Tide4       -0.308 -0.405  0.510  0.558              
men.ctch.rt  0.072 -0.205  0.280  0.423  0.347       
mean.for.rt -0.236  0.070 -0.218 -0.386 -0.260 -0.956

As I have been running a dredge to find the best fit models I would also end up with results in the following format from model.avg

Component models:
   df  logLik   AICc Delta Weight
3   6 -167.43 349.19  0.00   0.55
13  7 -166.86 350.92  1.72   0.23
23  7 -166.90 351.00  1.80   0.22

Term codes:
mean.catch.rate   mean.for.rate            Tide 
              1               2               3 

Model-averaged coefficients: 
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      94.6299     5.0785  18.633  < 2e-16 ***
Tide2             0.3714     5.2833   0.070 0.943965    
Tide3            -6.1620     4.9492   1.245 0.213109    
Tide4           -16.4001     4.9791   3.294 0.000989 ***
mean.catch.rate   0.4728     0.4387   1.078 0.281208    
mean.for.rate     0.3170     0.3052   1.039 0.298933    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Full model-averaged coefficients (with shrinkage): 
 (Intercept)      Tide2      Tide3      Tide4 mean.catch.rate mean.for.rate
   94.629898   0.371350  -6.162041 -16.400131        0.109312      0.070415

Relative variable importance:
    (Intercept)             Age mean.catch.rate   mean.for.rate            Tide 
           1.00            0.00            0.23            0.22            1.00

and the confidence intervals:

> confint(avgmodD2)
                      2.5 %      97.5 %
(Intercept)      84.6762427 104.5835534
Tide2            -9.9837967  10.7264976
Tide3           -15.8622641   3.5381824
Tide4           -26.1590703  -6.6411917
mean.catch.rate  -0.3870958   1.3326081
mean.for.rate    -0.2811417   0.9151314

I am just not sure how to get the values for the first group from each categorical fixed effect and if the rest of the values e.g for Tide3 need to be adjusted in relation. I just cannot find any paperwork on how to do this.

I appreciate of someone could spend a little time to explain this to me.

Thank you.

Rachel

Ammendment in response to answer by jbowman:

I ran all four options:
With and without intercept then these with the y adjustment. The Relative Importance of each factor stayed the same in each:

Regular model average with no removal of intercept

Model-averaged coefficients: 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  98.3228     4.5234  21.736  < 2e-16 ***
Tide2             -0.1183     5.4797   0.022  0.98277    
Tide3             -5.9914     5.2249   1.147  0.25151    
Tide4             -16.2022     5.2832   3.067  0.00216 ** 
MF.vs.OF2    -6.1143     4.5997   1.329  0.18376    

Full model-averaged coefficients (with shrinkage): 
 (Intercept)     Tide2     Tide3     Tide4 MF.vs.OF2
    98.32281  -0.11833  -5.99141 -16.20216  -2.37172

Without intercept:

          Estimate Std. Error z value Pr(>|z|)    
Tide1       96.742      3.601  26.866   <2e-16 ***
Tide2       59.097     47.373   1.247    0.212    
Tide3       53.224     46.893   1.135    0.256    
Tide4       43.014     46.626   0.923    0.356    
MF.vs.OF1  100.818      4.703  21.436   <2e-16 ***
MF.vs.OF2   94.704      3.882  24.398   <2e-16 ***

Full model-averaged coefficients (with shrinkage): 
  Tide1  Tide2  Tide3  Tide4 MF.vs.OF1 MF.vs.OF2
 59.216 59.097 53.224 43.014    39.107    36.735

With y removed but still with intercept:

            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   7.5223     4.5234   1.663  0.09632 . 
Tide2        -0.1183     5.4797   0.022  0.98277   
Tide3        -5.9914     5.2249   1.147  0.25151   
Tide4       -16.2022     5.2832   3.067  0.00216 **
MF.vs.OF2    -6.1143     4.5997   1.329  0.18376   

Full model-averaged coefficients (with shrinkage): 
 (Intercept)     Tide2     Tide3     Tide4 MF.vs.OF2
     7.52235  -0.11833  -5.99141 -16.20216  -2.37172

And lastly with y mean removed (with Intercept also removed):

Model-averaged coefficients: 
          Estimate Std. Error z value Pr(>|z|)  
Tide1        5.941      3.601   1.650   0.0990 .
Tide2        3.518      5.520   0.637   0.5239  
Tide3       -2.355      5.013   0.470   0.6385  
Tide4      -12.566      4.917   2.556   0.0106 *
MF.vs.OF1   10.017      4.703   2.130   0.0332 *
MF.vs.OF2    3.903      3.882   1.006   0.3146  

Full model-averaged coefficients (with shrinkage): 
    Tide1    Tide2    Tide3    Tide4 MF.vs.OF1 MF.vs.OF2
   3.6366   3.5183  -2.3548 -12.5655    3.8857    1.5140

Note p-values change with each but RVIs do not. Not sure how to continue and which I should use. Could I calculate the Intercept and relative Estimates from these values?
Thank you.

Best Answer

If you ask lmer not to estimate an intercept, it's smart enough to realize this means you must want the absolute, rather than relative, factor estimates (some of the output below has been removed to save space). You should probably center the dependent variable before running the model without an intercept, as is done in the code below.

> library(lme4)
> 
> # Construct sample data
> x <- as.factor(rep(c("A","B","C"), 10))
> z <- as.factor(rep(c("D","E","F","G","H"),6))
> y <- rnorm(30, as.numeric(x))
> 
> 
> # Run model with intercept: gives relative effects
> summary(lmer(y~x+(1|z)))
Linear mixed model fit by REML 
Formula: y ~ x + (1 | z) 
   AIC   BIC logLik deviance REMLdev
 92.76 99.77 -41.38    81.12   82.76
Random effects:
 Groups   Name        Variance Std.Dev.
 z        (Intercept) 0.00000  0.00000 
 Residual             0.97188  0.98584 
Number of obs: 30, groups: z, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept)   1.5572     0.3118   4.995
xB            0.4971     0.4409   1.128
xC            1.1541     0.4409   2.618

> # Run model w/o intercept: gives absolute effects
> # First center y so no confounding of intercept with effects
> y <- scale(y, scale=FALSE)
> summary(lmer(y~x+(1|z)-1))
Linear mixed model fit by REML 
Formula: y ~ x + (1 | z) - 1 
   AIC   BIC logLik deviance REMLdev
 92.76 99.77 -41.38    81.12   82.76
Random effects:
 Groups   Name        Variance Std.Dev.
 z        (Intercept) 0.00000  0.00000 
 Residual             0.97188  0.98584 
Number of obs: 30, groups: z, 5

Fixed effects:
   Estimate Std. Error t value
xA -0.55042    0.31175  -1.766
xB -0.05329    0.31175  -0.171
xC  0.60371    0.31175   1.937

Otherwise the factor estimates have to be relative (although not necessarily relative to the first factor), as with an intercept present, the factors and intercept together are perfectly multicollinear.

Related Solutions

Mixed-Model Analysis – Understanding Random Effects with Zero Variance

You say that each individual had at least two measurements, but from your output there are only 66 observations on 30 individuals, so only six individuals (at most) had more than two measurements. Two is the absolute minimum you need to calculate a mean and a standard distribution -- the random intercept is assumed to be a Normal distribution -- which will have a LOT of uncertainty. Looking at the plot, you have at least five individuals with essentially zero variance, and at least five individuals with a HUGE variance (probably caused by only two observations each).

I'd say you have too little data that's too noisy. The "clear" differences you see are mostly illusory because of the lack of data resulting in huge swings.

Solved – Mixed effects model output – no difference in AIC values

The first three models you've constructed differ in the ways the parameters are defined, but they have the same number of the parameters and the fits are equivalent in every way except for the numerical values of the parameters.

We can illustrate this with a plain linear model - mixed models just complicate the issue.

set.seed(101)
dd <- expand.grid(light=c("day","dusk","night"),
                  tide=c("base","Flooding","Ebbing"))
dd$y <- rnorm(nrow(dd))
## add one more row so fit isn't perfect
dd <- rbind(dd,dd[1,])
dd$y[nrow(dd)] <- rnorm(1)

Use model.matrix to see what parameters R will construct when fitting the model (you could also use names(coef(...)) on the output of lm(), or names(fixef(...)) on the output of (g)lmer).

tmpf <- function(f) {
    model.matrix(f,data=dd)
}
colnames(m1 <- tmpf(~light+tide+light:tide))
## [1] "(Intercept)"             "lightdusk"              
## [3] "lightnight"              "tideFlooding"           
## [5] "tideEbbing"              "lightdusk:tideFlooding" 
## [7] "lightnight:tideFlooding" "lightdusk:tideEbbing"   
## [9] "lightnight:tideEbbing"

If we use the * operator, we get the interaction plus the main effects; if we redundantly specify the main effects, R silently drops them.

all.equal(m1,tmpf(~light*tide))  ## TRUE
all.equal(m1,tmpf(~light+light*tide))  ## TRUE
all.equal(m1,tmpf(~light+tide+light*tide))  ## TRUE

If we use : but leave out one of the main effects we get the same number of parameters (9), but they are organized differently:

colnames(m2 <- tmpf(~light+light:tide))
## [1] "(Intercept)"             "lightdusk"              
## [3] "lightnight"              "lightday:tideFlooding"  
## [5] "lightdusk:tideFlooding"  "lightnight:tideFlooding"
## [7] "lightday:tideEbbing"     "lightdusk:tideEbbing"   
## [9] "lightnight:tideEbbing"

As I explain elsewhere, it rarely makes sense to test the model with interactions present but main effects missing; the only ways that I know of to do this are to construct the dummy variables yourself (either by hand, or by constructing the model matrix, dropping the terms you don't want, and using the remaining model matrix columns as (numeric) predictor variables.

The MuMIn package tries to do the right thing: from ?dredge,

By default, marginality constraints are respected, so “all possible combinations” include only those containing interactions with their respective main effects and all lower order terms.

library(MuMIn)    
full_model <- lm(y~light*tide,data=dd,na.action="na.fail")    
(dmods <- dredge(full_model))
## Model selection table 
##      (Int) lgh tid lgh:tid df logLik   AICc  delta weight
## 8 -0.27460   +   +       + 10 23.541 -247.1   0.00      1
## 1  0.24500                  2 -8.291   22.3 269.38      0
## 3 -0.16790       +          4 -5.948   27.9 274.98      0
## 2  0.07096   +              4 -7.821   31.6 278.72      0
## 4 -0.25820   +   +          6 -5.543   51.1 298.17      0

As you can see dredge has not tried to fit any models with the interaction but missing some main effects.

Best Answer

Related Solutions

Mixed-Model Analysis – Understanding Random Effects with Zero Variance

Solved – Mixed effects model output – no difference in AIC values

Related Question