Solved – How to easily interpret the F-statistic degrees of freedom

degrees of freedomdescriptive statisticslmr

How to "easily" interpret the F-statistic degrees of freedom (such as given in R's summary.lm)?

By "easily" I mean that rather than answering with some theoretical treatment on degrees of freedom of the F-statistic, which can be found from the internet, what are the F-statistic's degrees of freedom useful for when trying to infer aspects of a linear model?

E.g. the p-value of F-statistic gives information regarding, whether all the variables contribute significally to the model output. So what can the degrees of freedom be used for?

Best Answer

You don't usually "use" the degrees of freedom for anything. It's needed to work out the p value of the F statistic, since every F distribution (each identified by its two degrees-of-freedom parameters) is different. Since R - like most other statistical software - calculates that p-value for you, you don't necessarily need it for anything else.

However, it depends on what else you might want to do (but there are so many things you might want to do - some of which you might use those df numbers for) that this potentially becomes an overly broad question.

In regression, you can work out the two df parameters yourself before you fit the model; they're:

the number of predictors you fit (not counting the constant) and
the number of observations minus the number of predictors minus 1 (for the constant)

So that part of the output is not adding anything you shouldn't already know.

(I suggest you reconsider your objection to the "theoretical" considerations. They're frequently an essential part of understanding what you're doing.)

Related Solutions

Solved – Why are degrees of freedom so high in repeated measures mixed models

Unless I'm reading you incorrectly you would have approximately this many degrees of freedom in a standard repeated measures ANOVA. The ANOVA error is the interaction between subjects and the effect and requires degrees of freedom from both. However, if you are making multiple measures at each time interval then yes, the degrees for freedom are much higher for the mixed effects model.

It's not the standard pseudoreplication problem you'd have if you were doing a repeated measures ANOVA. With the ANOVA you are only modelling the effects in question and should aggregate your data to get better estimates of each effect for each S. With mixed effects modelling you are potentially modelling each data point, even those pseudoreplicated measurements you take for accuracies sake. You can do that because you're explicitly saying these individual measures are grouped within subjects and within this factor, etc. Therefore, you can often have more degrees of freedom. Although, as I stated before, it doesn't sound like that's the case here anyway.

Solved – Differences between PROC Mixed and lme / lmer in R – degrees of freedom

For the first question, the default method in SAS to find the df is not very smart; it looks for terms in the random effect that syntactically include the fixed effect, and uses that. In this case, since trt is not found in ind, it's not doing the right thing. I've never tried BETWITHIN and don't know the details, but either the Satterthwaite option (satterth) or using ind*trt as the random effect give correct results.

PROC MIXED data=Data;
    CLASS ind fac trt;
    MODEL y = trt /s ddfm=satterth;
    RANDOM ind /s;
run;

PROC MIXED data=Data;
    CLASS ind fac trt;
    MODEL y = trt /s;
    RANDOM ind*trt /s;
run;

As for the second question, your SAS code doesn't quite match your R code; it only has a term for fac*ind, while the R code has a term for both ind and fac*ind. (See the Variance Components output to see this.) Adding this gives the same SE for trt in all models in both Q1 and Q2 (0.1892).

As you note, this is an odd model to fit as the fac*ind term has one observation for each level, so is equivalent to the error term. This is reflected in the SAS output, where the fac*ind term has zero variance. This is also what the error message from lme4 is telling you; the reason for the error is that you most likely misspecified something as you're including the error term in the model in two different ways. Interestingly, there is one slight difference in the nlme model; it's somehow finding a variance term for the fac*ind term in addition to the error term, but you will notice that the sum of these two variances equal the error term from both SAS and nlme without the fac*ind term. However, the SE for trt remains the same (0.1892) as trt is nested in ind, so these lower variance terms don't affect it.

Finally, a general note about degrees of freedom in these models: They are computed after the model is fit, and so differences in the degrees of freedom between different programs or options of a program do not necessarily mean that the model is being fit differently. For that, one must look at the estimates of the parameters, both fixed effect parameters and covariance parameters.

Also, using the t and F approximations with a given number of degrees of freedom is fairly controversial. Not only are there several ways to approximate the df, some believe the practice of doing so is not a good idea anyway. A couple words of advice:

If everything is balanced, compare the results with the traditional least squares method, as they should agree. If it's close to balanced, compute them yourself (assuming balance) so that you can make sure the ones you're using are in the right ballpark.
If you have a large sample size, the degrees of freedom don't matter very much as the distributions get close to a normal and chi-squared.
Check out Doug Bates's methods for inference. His older method is based on MCMC simulation; his newer method is based on profiling the likelihood.

Best Answer

Related Solutions

Solved – Why are degrees of freedom so high in repeated measures mixed models

Solved – Differences between PROC Mixed and lme / lmer in R – degrees of freedom

Related Question