Solved – Non significant intercept but significant coefficients in mixed effect modelling

lme4-nlmemixed modelp-valuersatterthwaite

I am using mixed effect models to predict a time series of data. I am using lmertest() in R to overload lmer() to gain p values via Satterthwaite approximation. The general model for each formula in R syntax is:

Dependent Variable ~ time^2 + time + (time | random effect)

For those not versed in R, this is predicting my dependent variable using the fixed effect of time and time squared (to mimic a quadratic function) whilst allowing the second and trailing coefficients in a quadratic function to vary per time series. All models are using maximum likelihood.

My model appears to account for a reasonable amount of variance (~ .06 R^2m, .94 R^2c) but I'm having difficulty understanding the p values.
my intercept is highly non significant (~.76), but both the coefficients return <.001).

My questions are therefore:

What is the Satterthwaite approximation actually doing to create these values?

My fixed effects appear to be highly significant whilst my intercept isn't, how should I interpret this finding? My gut tells me this means that the model could find good coefficients which meant time could predict my DV, but that the intercepts the model found cannot be trusted as assisting with predictions?

Is there a better way to force out p-values from a mixed effect model than this? I'm considering using the anova() function from the car package which does a wald test mainly.

How concerned should I be about the non-significant intercept, given my question is does my nature in general tend to follow a concave polynomial shape over time?

Cheers.

Best Answer

It is not a good idea to be too concerned with p-values in mixed models. They are omitted from lme4 by the authors for good reasons, and "forcing" (as you put it) p-values out of the model is regarded by many as a very questionable thing to do. Moreover, since you appear to be focused on prediction rather than inference, a better approach may be to use cross-validation. Here I will quote Douglas Bates, the primary author of lme4, writing on the r-sig-me mailing list some years ago:

Users are often surprised and alarmed that the summary of a linear mixed model fit by lmer provides estimates of the fixed-effects parameters, standard errors for these parameters and a t-ratio but no p-values. Similarly the output from anova applied to a single lmer model provides the sequential sums of squares for the terms in the fixed-effects specification and the corresponding numerator degrees of freedom but no denominator degrees of freedom and, again, no p-values.

Because they feel that the denominator degrees of freedom and the corresponding p-values can easily be calculated they conclude that failure to do this is a sign of inattention or, worse, incompetence on the part of the person who wrote lmer (i.e. me).

Perhaps I can try again to explain why I don't quote p-values or, more to the point, why I do not take the "obviously correct" approach of attempting to reproduce the results provided by SAS. Let me just say that, although there are those who feel that the purpose of the R Project - indeed the purpose of any statistical computing whatsoever - is to reproduce the p-values provided by SAS, I am not a member of that group. If those people feel that I am a heretic for even suggesting that a p-value provided by SAS could be other than absolute truth and that I should be made to suffer a slow, painful death by being burned at the stake for my heresy, then I suppose that we will be able to look forward to an exciting finale to the conference dinner at UseR!2006 next month. (Well, I won't be looking forward to such a finale but the rest of you can.)

As most of you know the t-statistic for a coefficient in the fixed-effects model matrix is the square root of an F statistic with 1 numerator degree of freedom so we can, without loss of generality, concentrate on the F statistics that were present in the anova output. Those who long ago took courses in "analysis of variance" or "experimental design" that concentrated on designs for agricultural experiments would have learned methods for estimating variance components based on observed and expected mean squares and methods of testing based on "error strata". (If you weren't forced to learn this, consider yourself lucky.) It is therefore natural to expect that the F statistics created from an lmer model (and also those created by SAS PROC MIXED) are based on error strata but that is not the case.

The parameter estimates calculated by lmer are the maximum likelihood or the REML (residual maximum likelihood) estimates and they are not based on observed and expected mean squares or on error strata. And that's a good thing because lmer can handle unbalanced designs with multiple nested or fully crossed or partially crossed grouping factors for the random effects. This is important for analyzing data from large observational studies such as occur in psychometrics.

There are many aspects of the formulation of the model and the calculation of the parameter estimates that are very interesting to me and have occupied my attention for several years but let's assume that the model has been specified, the data given and the parameter estimates obtained. How are the F statistics calculated? The sums of squares and degrees of freedom for the numerators are calculated as in a linear model. There is a slot in an lmer model that is similar to the "effects" component in a lm model and that, along with the "assign" attribute for the model matrix provides the numerator of the F ratio. The denominator is the penalized residual sum of squares divided by the REML degrees of freedom, which is n-p where n is the number of observations and p is the column rank of the model matrix for the fixed effects.

Now read that last sentence again and pay particular attention to the word "the" in the phrase "the penalized residual sum of squares". All the F ratios use the same denominator. Let me repeat that - all the F ratios use the same denominator. This is why I have a problem with the assumption (sometimes stated as more that just an assumption - something on the order of "absolute truth" again) that the reference distribution for these F statistics should be an F distribution with a known numerator degrees of freedom but a variable denominator degrees of freedom and we can answer the question of how to calculate a p-value by coming up with a formula to assign different denominator degrees of freedom for each test. The denominator doesn't change. Why should the degrees of freedom for the denominator change?

Most of the research on tests for the fixed-effects specification in a mixed model begin with the assumption that these statistics will have an F distribution with a known numerator degrees of freedom and the only purpose of the research is to decide how to obtain an approximate denominator degrees of freedom. I don't agree.

Anyway, to answer the questions at hand, Satterthwaite's method is a way to approximate the degrees of freedom that Douglas Bates was describing above.

As for the non-significant fixed intercept, one way to interpret this is that, at some arbitrary level of significance (perhaps 5% if you follow the convention in many fields), the intercept may in fact be zero. Perhaps if you had a larger sample, it would be different from zero (one reason for not relying heavily on p-values in general, not just in mixed models). In other words, perhaps the actual data generating process that you are modelling results in an expected value of zero when other covariates are also zero (and that scenario may be total nonsense in this particular study, or it may be fine). A plot of the data may be very revealing regarding this.

I would also question whether it is a good idea to fit random slopes for the linear term but not the quadratic term. By doing so, you are allowing each group to have it's own linear term, yet the overall shape is constrained to be the same, so you are allowing a a shift in each parabola, but fixing the shape. Is this indicated by the relevant theory of whatever data generating process you are modelling ?

Related Question